I think I’ve found the problem.
According to: Why "loss.backward()" didn't update parameters' gradient?
Add a batch normalization layer could solve this.
I think I’ve found the problem.
According to: Why "loss.backward()" didn't update parameters' gradient?
Add a batch normalization layer could solve this.