Adam works well but SGD is not working at all

Hi I’m working on the binarized neural network proposed here binarynet using Pytorch. I saw there’s already a version available on Github BinarynetPytorch, where it used Adam as the optimizer. I tried to change it to SGD optimizer, however, the network then was not training at all.

I changed the initial learning rate for Adam (i.e., 5e-3) to 5e-1 for SGD. I also played with momentum various momentum and batch size. Unfortunately none of them worked. The loss value just fluctuated around one value, but never got dropped.

Are there anything I should do or check?

Thanks!

I changed the initial learning rate for Adam (i.e., 5e-3) to 5e-1 for SGD

This always depends on the experiment, but 5e-1 seems really high for a learning rate. Have you tried with smaller values?

Yes, I’ve been playing with various ranges of learning rate. However, none of them worked for SGD. I wonder if the way binarized neural net was implemented does not support SGD?