Loss not converging with Adagrad

sharad · May 26, 2019, 10:54pm

Hi. I am new to Pytorch and trying to implement various Optimizer algorithms using same. While implementing Adagrad, loss function does not converge as expected. I tried to see gradients of layers on different batches and they start to become zero very early during training. I am not able to figure out the reason for network to not converge. Is there a bug in my implementation? Can you please point me in the right direction?

I have included the picture of implementation

ptrblck · May 27, 2019, 5:36am

I think a good method to check your custom implementation would be to compare the gradients and updated with the PyTorch implementation.

sharad · June 1, 2019, 11:00am

Thanks @ptrblck , I figured out the issue, I was calculating square of my gradient tensor wrongly.