Nn criterions don't compute the gradient w.r.t. targets error

But why throw an error ?

I came across a use case where I needed to minimize the mse between intermediate features of an auto-encoder: both input and target need to be differentiated here.

I had to trade nn.MSELoss(encoder_i, decoder_i) for torch.sum((encoder_i - decoder_i)**2) which also does the job. However I’m not 100% sure I didn’t lose somthing with this fix (efficiency ?). I don’t understand why such use of nn losses are not permited.

2 Likes