I hope my title makes sense. I would like to know whether the following two snippets of code are equivalent or not:
criterion = nn.MSELoss()
output = net(input)
loss = criterion(output, target)
loss.backward()
Where output is an array of multiple values. Would the next snippet of code be identical?
criterion = nn.MSELoss()
output = net(input)
loss = 0
for (out, tar) in zip(output, target):
loss += criterion(out, tar)
loss.backward()
Now assume I am not using MSELoss but a custom loss function I wrote myself. Would this still hold?
The result should be the same, if you use the sum as the reduction type (reduction='sum'
for nn.MSELoss
).
Thus it depends also what reduction you are using in your custom loss function.
Here is an example code snippet using nn.CrossEntropyLoss
.
Note that I called model.eval()
to get the same outputs. Otherwise the first forward pass would update the batchnorm layers, which would yield a small difference.
model = models.resnet18().eval()
x = torch.randn(10, 3, 224, 224)
target = torch.randint(0, 1000, (10,))
criterion = nn.CrossEntropyLoss(reduction='sum')
output = model(x)
loss = criterion(output, target)
loss.backward()
grads1 = []
for param in model.parameters():
grads1.append(param.grad.clone())
model.zero_grad()
output = model(x)
loss = 0
for o, t in zip(output, target):
loss += criterion(o.unsqueeze(0), t.unsqueeze(0))
loss.backward()
grads2 = []
for param in model.parameters():
grads2.append(param.grad.clone())
for g1, g2 in zip(grads1, grads2):
if not torch.allclose(g1, g2):
print('mismatch!')
1 Like