Once you move out of pytorch and into something like numpy,
autograd can no longer track your gradients, so you (your loss
function) will have to do it yourself. That’s what backward is for.
Your backward doesn’t do anything (except return its input as
its output). So the gradients that get passed to the optimizer don’t
know anything about the structure of your loss function. That is,
your gradients are incorrect, so the optimizer won’t be moving
your weights to a lower loss.
You either need to rewrite your loss function using pytorch Tensor functions so that autograd can track and calculate
the gradients automatically for you, or you have do calculus
on your loss function to get its gradients and implement them
by hand in your backward function.