The result should be the same, if you use the sum as the reduction type (reduction='sum' for nn.MSELoss).
Thus it depends also what reduction you are using in your custom loss function.
Here is an example code snippet using nn.CrossEntropyLoss.
Note that I called model.eval() to get the same outputs. Otherwise the first forward pass would update the batchnorm layers, which would yield a small difference.
model = models.resnet18().eval()
x = torch.randn(10, 3, 224, 224)
target = torch.randint(0, 1000, (10,))
criterion = nn.CrossEntropyLoss(reduction='sum')
output = model(x)
loss = criterion(output, target)
loss.backward()
grads1 = []
for param in model.parameters():
grads1.append(param.grad.clone())
model.zero_grad()
output = model(x)
loss = 0
for o, t in zip(output, target):
loss += criterion(o.unsqueeze(0), t.unsqueeze(0))
loss.backward()
grads2 = []
for param in model.parameters():
grads2.append(param.grad.clone())
for g1, g2 in zip(grads1, grads2):
if not torch.allclose(g1, g2):
print('mismatch!')