Does a loss divided by n equivalent with learning rate / n?

dashesy · June 20, 2018, 12:05am

@albanD Please correct me if I am wrong. it seems like this is because of the way PyTorch’s SGD is different from other frameworks (e.g. caffe)

I am porting a network from caffe and am trying to understand why after/if I increase the lr (after certain epoch), network always becomes unstable (inf weights and nan loss).
~~~It seems like PyTorch’s SGD is more sensitive to lr changes because it is applied to velocity instead of gradients. Is there any particular reason for this choice?~~~
EDIT: I added a SGD that is more like other frameworks and still if I increase lr, the network becomes unstable. Decreasing lr is always fine.