But why throw an error ?
I came across a use case where I needed to minimize the mse between intermediate features of an auto-encoder: both input and target need to be differentiated here.
I had to trade nn.MSELoss(encoder_i, decoder_i)
for torch.sum((encoder_i - decoder_i)**2)
which also does the job. However I’m not 100% sure I didn’t lose somthing with this fix (efficiency ?). I don’t understand why such use of nn losses are not permited.