About the running errors with CUDA in C++

Hi, All

I have a running error with CUDA in C++ libtorch as follows:
terminate called after throwing an instance of ‘c10::Error’
what(): Tensor for argument #3 ‘mat2’ is on CPU, but expected it to be on GPU (while checking arguments for addmm)
Exception raised from checkSameGPU at /pytorch/aten/src/ATen/TensorUtils.cpp:122 (most recent call first):
It seems to me that it is a conflict between GPU and CPU. But I am not sure exactly what is the conflict and how to correct it.
The related code is:
torch::Tensor forward(int64_t batch_size, bool cuda = false){
torch::Tensor x=torch::autograd::Variable(torch::rand({batch_size,z_dim}));
x = x.cuda();

and before that I have a line related to CPU:
I have a line like:
return loss.data().cpu();

Shall I modify this line to other ways ? Any comments please ?


This might not be conflict. This is caused by mixed CPU tensor (storage is in main memory) and GPU tensor (storage is on GPU) together in a function.
Check your code, I guess there should be somewhere that you passed both CPU and GPU tensors together to a function.

Thanks so the line that triggers the error is at:
x is defined as
torch::Tensor x=torch::autograd::Variable(torch::rand({batch_size,z_dim}));

The code does not allow me to use


So in the debugger, how shall I check if x is CPU tensor or GPU tensor please ? Are there any other tensors that I need to check ? Thanks.

bn1, fc1 and bn1_b are defined as:

GeneratorImpl(int64_t z_dim, int64_t output_dim){

    fc1 = register_module("fc1", torch::nn::Linear(torch::nn::LinearOptions(z_dim, 500).bias(false)));

    bn1 = register_module("bn1", torch::nn::BatchNorm1d(torch::nn::BatchNorm1dOptions(500).eps(1e-6).momentum(0.5).affine(false)));
    fc2 = register_module("fc2", torch::nn::Linear(torch::nn::LinearOptions(500, 500).bias(false)));
    bn2 = register_module("bn2", torch::nn::BatchNorm1d(torch::nn::BatchNorm1dOptions(500).eps(1e-6).momentum(0.5).affine(false)));
    fc3 = register_module("fc3",LinearWeightNorm(500, output_dim, 1));
    bn1_b = register_parameter("bn1_b",torch::zeros(500));
    bn2_b = register_parameter("bn2_b",torch::zeros(500));

I guess the problem is here:
bn1_b = register_parameter(“bn1_b”,torch::zeros(500));
bn2_b = register_parameter(“bn2_b”,torch::zeros(500));
Both zero tensor are CPU tensors.

x was converted to gpu but bn1_b is not.

And I am wondering what is the error message when you do assert? assert should be doable.

Thanks for your comment.

Really appreciate that.