Why "loss.backward()" didn't update parameters' gradient?

tensor.mean(1) calculates the mean of the tensor in dim1. You could use this approach on an output activation.

1 Like