What's wrong with my Adam implementation?

Hello PyTorch community!
I’m trying to implement Adam by myself for a learning purpose.

Here is my Adam implementation: https://gist.github.com/byorxyz/dfe3da1000e67aced1c7d9279351cb88

I think I implemented everything correct however the loss graph of my implementation is very spiky compared to that of torch.optim.Adam.

My ADAM implementation loss graph (below)

torch.optim.Adam loss graph (below)

If someone could look at my code and tell me what I am doing wrong, I’ll be very grateful. Thank you for PyTorch!
(For the full code including data (super easy to run): https://github.com/byorxyz/AMS_pytorch/blob/master/AdamFails_1dConvex.ipynb)