Adding Losses of items in output array vs Loss on entire Array

I hope my title makes sense. I would like to know whether the following two snippets of code are equivalent or not:

criterion = nn.MSELoss()
output = net(input)
loss = criterion(output, target)
loss.backward()

Where output is an array of multiple values. Would the next snippet of code be identical?

criterion = nn.MSELoss()
output = net(input)
loss = 0
for (out, tar) in zip(output, target):
    loss += criterion(out, tar)
loss.backward()

Now assume I am not using MSELoss but a custom loss function I wrote myself. Would this still hold?

The result should be the same, if you use the sum as the reduction type (reduction='sum' for nn.MSELoss).
Thus it depends also what reduction you are using in your custom loss function.

Here is an example code snippet using nn.CrossEntropyLoss.
Note that I called model.eval() to get the same outputs. Otherwise the first forward pass would update the batchnorm layers, which would yield a small difference.

model = models.resnet18().eval()

x = torch.randn(10, 3, 224, 224)
target = torch.randint(0, 1000, (10,))

criterion = nn.CrossEntropyLoss(reduction='sum')

output = model(x)
loss = criterion(output, target)
loss.backward()

grads1 = []
for param in model.parameters():
    grads1.append(param.grad.clone())

model.zero_grad()
output = model(x)
loss = 0
for o, t in zip(output, target):
    loss += criterion(o.unsqueeze(0), t.unsqueeze(0))
loss.backward()

grads2 = []
for param in model.parameters():
    grads2.append(param.grad.clone())


for g1, g2 in zip(grads1, grads2):
    if not torch.allclose(g1, g2):
        print('mismatch!')
1 Like