The gradient descent part of my loop looks like this:
torch::Tensor loss = torch::nn::functional::mse_loss(guess, target);
optimizer.zero_grad();
loss.backward();
optimizer.step();
I’d like to terminate the loop when guess has stopped changing - would I just look for the size of the difference between previous guess and guess, or is there a better way?
You could also use the total norm of the gradients as a criterion and stop when it gets really small, indicating that you are close to a critical point. This saves you from needing to keep around a copy of the parameters.
It won’t be exactly the same when you have momentum and adaptive optimizers, but it’s not an uncommon stopping criterion in gradient descent.