Nn criterions don't compute the gradient w.r.t. targets

by default, the criterions in the nn package indeed dont.

if you write MSE as:

def mse_loss(input, target):
    return torch.sum((input - target)^2) / input.data.nelement()

Then you can indeed compute the gradient wrt input and target

7 Likes