I tried 2 kinds of computing multi-losses methods in the deeply-supervised nets.
The first one needs to add retain_graph=True in loss.backward() while the second doesn’t and I don’t understand why.

...
output = model(data)
loss = 0
for i, out in enumerate(output):
...
loss+=(F.nll_loss(out, target))
loss.backward()

...
output = model(data)
loss = 0
loss1 = []
for i, out in enumerate(output):
...
loss1.append(F.nll_loss(out, target))
loss = reduce(lambda x, y: x+y, loss1)
loss.backward()

The first code snippet runs fine without retain_graph=True using your code as the base:

model = nn.Linear(1, 1)
data = torch.randn(10, 1)
target = torch.zeros(10).long()
output = model(data)
loss = 0.
for out, tar in zip(output, target):
loss += F.nll_loss(out.unsqueeze(0), tar.unsqueeze(0))
loss.backward()
print(model.weight.grad)

It seems that the model you defined is rather simple.
My model looks like this, which return losses from different stages.
Maybe that’s where the problem arises?