Hi,
I’m having a weird problem with autograd in recent PyTorch. I’m trying to make a simple example following the official documentation, but instead of in Python, I want to do it in C++ for basically having a test case. Here’s my code:
#include <iostream>
#include <torch/torch.h>
int main() {
if (torch::cuda::is_available()) {
std::cout << "CUDA device is available.\n";
} else {
std::cout << "CUDA device is not available.\n";
}
try {
auto test = torch::ones({2, 2}, at::requires_grad()).cuda();
if (test.device().type() == torch::kCUDA) {
std::cout << "We are using CUDA\n";
} else if (test.device().type() == torch::kCPU) {
std::cout << "Oops... we are supposed to run on CUDA GPU, but we are running on CPU.";
} else {
std::cout << "Uuuhm, we are supposed to run on CUDA GPU, but we are not even running on CPU.";
}
auto a = torch::ones({2, 2}, torch::TensorOptions().dtype(torch::kFloat).requires_grad(true)).cuda();
auto b = torch::randn({2, 2}, torch::TensorOptions().dtype(torch::kFloat)).cuda();
std::cout << "a: " << a << "\n";
auto c = a + b;
std::cout << "c: " << c << "\n";
auto cgrad = c.grad();
std::cout << "cgrad : " << cgrad << "\n";
// tried this - doesn't work either
// torch::Tensor a_grad = torch::tensor({1.0, 0.1, 1.0, 0.01}, torch::TensorOptions().dtype(torch::kFloat)).cuda().view({2,2});
torch::Tensor a_grad = torch::tensor({1.0, 1.0, 1.0, 1.0}, torch::TensorOptions().dtype(torch::kFloat)).cuda().view({2,2});
std::cout << "a_grad: " << a_grad << "\n";
// tried this as well, also doesn't work
// c.sum().backward(a_grad); // a.grad() will now hold the gradient of c w.r.t. a.
c.sum().backward(); // a.grad() will now hold the gradient of c w.r.t. a.
std::cout << "a.grad(): "<< a.grad() << "\n";
std::cout << "a_grad: "<< a_grad << "\n";
torch::Tensor g_loss = torch::binary_cross_entropy(a, b);
g_loss.backward();
std::cout << "g_loss: " << g_loss << "\n";
std::cout << "a.grad(): " << a.grad() << "\n";
} catch (const c10::Error& e) {
// this was the original code (for CPU) that worked fine on PyTorch 1.2
std::cout << e.what() << "\n";
auto test = torch::ones({2, 2}, at::requires_grad());
if (test.device().type() == torch::kCPU) {
std::cout << "We are using CPU\n";
} else if (test.device().type() == torch::kCUDA) {
std::cout << "Oops... we are supposed to run on CPU, but we are running on CUDA GPU.";
} else {
std::cout << "Uuuhm, we are supposed to run on CPU, but we are not even running on CUDA GPU.";
}
auto a = torch::ones({2, 2}, at::requires_grad());
auto b = torch::randn({2, 2});
auto c = a + b;
c.backward(a); // a.grad() will now hold the gradient of c w.r.t. a.
std::cout << c << "\n";
}
return 0;
}
I’ve played around with different versions of it. You can see the CPU specific code section is showing the previous structure I used before PyTorch 1.2 (it’s showing CPU variants, but was basically equivalent for GPU as well). The new style I’m trying to write currently only sits in the GPU section. This is done for illustration purposes here, but I’m actually trying the same for both. This is to account for the change in how autograd deals with passing the dispatch & Jacobian to construct the gradients (link).
That’s all good, but since then I’ve not been able to get any gradients to work. I am probably doing something wrong, but can’t put my finger on it (or found a weird bug, no idea). The output I get is this:
$ make -C pytorch/autograd run
make[1]: Entering directory '/home/gizdov/Git/arch-package-tests/pytorch/autograd'
g++ autograd.cpp -I/usr/include/torch/csrc/api/include -I/usr/include/python3.8 -I/usr/include/python3.8m -L/usr/lib/pytorch -L/opt/cuda/lib -lc10 -ltorch -lnvrtc -lcuda -o autograd
./autograd
CUDA device is available.
We are using CUDA
a: 1 1
1 1
[ CUDAFloatType{2,2} ]
c: 2.8679 0.9870
0.8103 1.9954
[ CUDAFloatType{2,2} ]
cgrad : [ Tensor (undefined) ]
a_grad: 1 1
1 1
[ CUDAFloatType{2,2} ]
a.grad(): [ Tensor (undefined) ]
a_grad: 1 1
1 1
[ CUDAFloatType{2,2} ]
g_loss: 9.25194
[ CUDAFloatType{} ]
a.grad(): [ Tensor (undefined) ]
make[1]: Leaving directory '/home/gizdov/Git/arch-package-tests/pytorch/autograd'
My gradients are always undefined or I get a runtime error from c10 for passing the wrong dispatch tensor if a_grad
is missing or malformed.
Could someone please help?
Thanks.
EDIT: This is on PyTorch 1.4.1, CUDA 10.2 Compute Capability 5.2, GCC 8.4.0, Python 3.8.2