Simple Linear Regression with Pytorch

I think the reason why the small learning rate works is that because the values in the data are large, the small learning rate prevents the gradients from exploding. That’s why normalizing the data helps