a.sub_(lr * a.grad) ??

as I understand it, this subtracts the coefficients by (learning rate * gradient)…

But why is it necessary to subtract?

Thanks :smile:

If anyone has the same question, it’s answered here