Regarding the usage of ASGD Optimizer in CNN

Hello everyone,

I am using pytorch to build a CNN model to classify DNA sequences. Regarding the optimizer, I tried both Adam and SGD but the accuracy was not improving no matter whether I increase the lr or decrease it. Then I found another optimizer called ASGD and using it instantly improved the accuracy of my model with the default parameters.

optimizer = torch.optim.ASGD(model.parameters(), lr=0.01, lambd=0.0001, alpha=0.75, t0=1000000.0, weight_decay=0)

So just wanted to know if ASGD is also commonly used because I could see that Adam and SGD are widely used but ASGD is not so common and I am a bit doubtful since I am new to Deep Learning. Does anyone have experience using it in CNN and let me know it’s advantages and disadvantages??

Thanks!