Parameters not changed after optimized

I try to optimize some manually created parameters to see how the grad system work. But the parameters never changed for the same input, here is the code:

#include <string>
#include <iostream>
#include <torch/torch.h>

torch::Tensor forward_and_optimize_parameter(
    torch::Tensor&& input,
    std::vector<torch::Tensor>& parameters,
    torch::optim::SGD& optim) {
  auto loss = torch::sum(torch::pow(input + parameters.front(), 2));
  optim.zero_grad();
  loss.backward();
  optim.step(); // I expect the parameters being changed after this, but it's not, why?
  return loss;
}

int main() {
  auto options = torch::TensorOptions().requires_grad(true);
  std::vector<torch::Tensor> parameters{torch::rand(5, options)};
  torch::optim::SGD optimizer(parameters, /*lr=*/1);

  for (size_t i = 0; i <= 10; i++) {
    auto loss = forward_and_optimize_parameter(torch::ones({5}), parameters, optimizer);
    std::cout<<loss<<std::endl; // the loss stays the same for the same input, the optimize is not working?
  }

  return 0;
}

You win the “exceptionally bad luck in picking a small example” award :trophy:.

Your parameters do change, just your loss does not.

So what happens is that the gradient of (x-1)**2 is 2*(x-1) and with an SGD lr or 1, you get an oscillation between your initial value and 2-x which both happen to have the same loss value. With an lr of 0.1 or anything else below 1, your loss decreases as you would expect.

If you allow two unsolicited comments:

  • Many people claim that using std is a bad habit. Given that many people will read your code on the forums, I’d advise against doing this.
  • Just 10 iterations would be fully sufficient so see the loss doesn’t change. :slight_smile:

I will admit that I spent quite a while looking whether you somewhere made the parameter a non-leaf tensor by accident and only when I printed the gradient (to see if there was any at all) I noticed what happens.

Best regards

Thomas

2 Likes

Thanks Tom you’re a life saver. Just as you said I changed the learning rate to 0.1 and everything is fine.

Also have dropped the using namespace std and changed iteration number in the original question, thanks again for these great tips. I’m kind a newbie in CPP :smile: