Very confused with changes in autograd

kgizdov · March 25, 2020, 2:54pm

Hi,

I’m having a weird problem with autograd in recent PyTorch. I’m trying to make a simple example following the official documentation, but instead of in Python, I want to do it in C++ for basically having a test case. Here’s my code:

#include <iostream>
#include <torch/torch.h>

int main() {
  if (torch::cuda::is_available()) {
    std::cout << "CUDA device is available.\n";
  } else {
    std::cout << "CUDA device is not available.\n";
  }
  try {
    auto test = torch::ones({2, 2}, at::requires_grad()).cuda();
    if (test.device().type() == torch::kCUDA) {
      std::cout << "We are using CUDA\n";
    } else if (test.device().type() == torch::kCPU) {
      std::cout << "Oops... we are supposed to run on CUDA GPU, but we are running on CPU.";
    } else {
      std::cout << "Uuuhm, we are supposed to run on CUDA GPU, but we are not even running on CPU.";
    }
    auto a = torch::ones({2, 2}, torch::TensorOptions().dtype(torch::kFloat).requires_grad(true)).cuda();
    auto b = torch::randn({2, 2}, torch::TensorOptions().dtype(torch::kFloat)).cuda();
    std::cout << "a: " << a << "\n";
    auto c = a + b;
    std::cout << "c: " << c << "\n";
    auto cgrad = c.grad();
    std::cout << "cgrad : " << cgrad << "\n";
    // tried this - doesn't work either
    // torch::Tensor a_grad = torch::tensor({1.0, 0.1, 1.0, 0.01}, torch::TensorOptions().dtype(torch::kFloat)).cuda().view({2,2});
    torch::Tensor a_grad = torch::tensor({1.0, 1.0, 1.0, 1.0}, torch::TensorOptions().dtype(torch::kFloat)).cuda().view({2,2});
    std::cout << "a_grad: " << a_grad << "\n";
    // tried this as well, also doesn't work
    // c.sum().backward(a_grad);  // a.grad() will now hold the gradient of c w.r.t. a.
    c.sum().backward();  // a.grad() will now hold the gradient of c w.r.t. a.
    std::cout << "a.grad(): "<< a.grad() << "\n";
    std::cout << "a_grad: "<< a_grad << "\n";
    torch::Tensor g_loss = torch::binary_cross_entropy(a, b);
    g_loss.backward();
    std::cout << "g_loss: " << g_loss << "\n";
    std::cout << "a.grad(): " << a.grad() << "\n";
  } catch (const c10::Error& e) {
    // this was the original code (for CPU) that worked fine on PyTorch 1.2
    std::cout << e.what() << "\n";
    auto test = torch::ones({2, 2}, at::requires_grad());
    if (test.device().type() == torch::kCPU) {
      std::cout << "We are using CPU\n";
    } else if (test.device().type() == torch::kCUDA) {
      std::cout << "Oops... we are supposed to run on CPU, but we are running on CUDA GPU.";
    } else {
      std::cout << "Uuuhm, we are supposed to run on CPU, but we are not even running on CUDA GPU.";
    }
    auto a = torch::ones({2, 2}, at::requires_grad());
    auto b = torch::randn({2, 2});
    auto c = a + b;
    c.backward(a); // a.grad() will now hold the gradient of c w.r.t. a.
    std::cout << c << "\n";
  }
  return 0;
}

I’ve played around with different versions of it. You can see the CPU specific code section is showing the previous structure I used before PyTorch 1.2 (it’s showing CPU variants, but was basically equivalent for GPU as well). The new style I’m trying to write currently only sits in the GPU section. This is done for illustration purposes here, but I’m actually trying the same for both. This is to account for the change in how autograd deals with passing the dispatch & Jacobian to construct the gradients (link).

That’s all good, but since then I’ve not been able to get any gradients to work. I am probably doing something wrong, but can’t put my finger on it (or found a weird bug, no idea). The output I get is this:

$ make -C pytorch/autograd run
make[1]: Entering directory '/home/gizdov/Git/arch-package-tests/pytorch/autograd'
g++ autograd.cpp -I/usr/include/torch/csrc/api/include -I/usr/include/python3.8 -I/usr/include/python3.8m -L/usr/lib/pytorch -L/opt/cuda/lib -lc10 -ltorch -lnvrtc -lcuda -o autograd
./autograd
CUDA device is available.
We are using CUDA
a:  1  1
 1  1
[ CUDAFloatType{2,2} ]
c:  2.8679  0.9870
 0.8103  1.9954
[ CUDAFloatType{2,2} ]
cgrad : [ Tensor (undefined) ]
a_grad:  1  1
 1  1
[ CUDAFloatType{2,2} ]
a.grad(): [ Tensor (undefined) ]
a_grad:  1  1
 1  1
[ CUDAFloatType{2,2} ]
g_loss: 9.25194
[ CUDAFloatType{} ]
a.grad(): [ Tensor (undefined) ]
make[1]: Leaving directory '/home/gizdov/Git/arch-package-tests/pytorch/autograd'

My gradients are always undefined or I get a runtime error from c10 for passing the wrong dispatch tensor if a_grad is missing or malformed.

Could someone please help?
Thanks.

EDIT: This is on PyTorch 1.4.1, CUDA 10.2 Compute Capability 5.2, GCC 8.4.0, Python 3.8.2

richard · March 25, 2020, 3:06pm

Please see the comment here: ValueError: can't optimize a non-leaf Tensor?. The main problem you are running into is that
auto a = torch::ones({2, 2}, torch::TensorOptions().dtype(torch::kFloat).requires_grad(true)).cuda();
is NOT a leaf node, so autograd does not accumulate gradients into a.grad. This is because you’re calling .cuda() on the actual leaf node, returning a new tensor. If you directly construct the tensor on CUDA, then you can get gradients on it:

auto a = torch::ones({2, 2}, torch::TensorOptions().dtype(torch::kFloat).requires_grad(true).device(torch::kCUDA));
auto c = (a + a).sum();
c.backward()
a.grad() // now has something

kgizdov · March 25, 2020, 3:34pm

Hi,

Thanks. Indeed that solved part of the issue. However, does that mean I cannot get intermediate gradients? For example, I cannot seem to get c = a + b to get anything for c.grad().

albanD · March 25, 2020, 3:47pm

In python, you can use c.retain_grad() for this to be populated for non-leafs.
Not sure if it is available on the cpp API yet? cc @yf225

Otherwise, you can use autograd::grad and provide the inputs you want gradient for, where you can specify any Tensor that requires gradients (no need for them to be leafs).

kgizdov · March 25, 2020, 4:44pm

Thanks.

Well, I did indeed get it working to a point, but the C++ API does not seem to implement this. Firstly, retain_grad() is not defined. Secondly, when using auto cgrad = torch::autograd::grad({c}, {a, b}, {a_dispatch});, I get the following:

terminate called after throwing an instance of 'std::runtime_error'
  what():  the derivative for 'target' is not implemented

So this is also not implemented. I guess for now I will stick with what works.
Cheers.

EDIT: wait, no, this seems to be caused by requiring a gradient on b as well to satisfy autograd::grad() call, which then breaks torch::binary_cross_entropy(a, b). So that is what is not implemented. Not sure how to work around it - maybe give it another const for the time being.

yf225 · March 26, 2020, 12:17am

retain_grad() should already work on torch::Tensor, we have a test here: https://github.com/pytorch/pytorch/blob/64a6faa2c8435a9e96743b078d9f2ae2b7ef1cc3/test/cpp/api/autograd.cpp#L129-L153