I have a running error with CUDA in C++ libtorch as follows:
terminate called after throwing an instance of ‘c10::Error’
what(): Tensor for argument #3 ‘mat2’ is on CPU, but expected it to be on GPU (while checking arguments for addmm)
Exception raised from checkSameGPU at /pytorch/aten/src/ATen/TensorUtils.cpp:122 (most recent call first):
It seems to me that it is a conflict between GPU and CPU. But I am not sure exactly what is the conflict and how to correct it.
The related code is:
torch::Tensor forward(int64_t batch_size, bool cuda = false){
torch::Tensor x=torch::autograd::Variable(torch::rand({batch_size,z_dim}));
if(cuda)
x = x.cuda();
x=torch::nn::functional::softplus(bn1(fc1(x))+bn1_b);
and before that I have a line related to CPU:
I have a line like:
return loss.data().cpu();
Shall I modify this line to other ways ? Any comments please ?
@Chen0729
This might not be conflict. This is caused by mixed CPU tensor (storage is in main memory) and GPU tensor (storage is on GPU) together in a function.
Check your code, I guess there should be somewhere that you passed both CPU and GPU tensors together to a function.
Thanks so the line that triggers the error is at:
“x=torch::nn::functional::softplus(bn1(fc1(x))+bn1_b);”
x is defined as
torch::Tensor x=torch::autograd::Variable(torch::rand({batch_size,z_dim}));
The code does not allow me to use
"assert(x.device().type()==torch::kCUDA);"
So in the debugger, how shall I check if x is CPU tensor or GPU tensor please ? Are there any other tensors that I need to check ? Thanks.
@Chen0729
I guess the problem is here:
bn1_b = register_parameter(“bn1_b”,torch::zeros(500));
bn2_b = register_parameter(“bn2_b”,torch::zeros(500));
Both zero tensor are CPU tensors.
x was converted to gpu but bn1_b is not.
x=torch::nn::functional::softplus(bn1(fc1(x))+bn1_b)
And I am wondering what is the error message when you do assert? assert should be doable.