Allocation of Tensor on CUDA fails

Hi, want to create a tensor from an existing image buffer copied directly to the GPU during creation,

The model was loaded like this:
mod_ = torch::jit::load("/data/deeplabv3-mobilenetv3-large.pt", torch::Device(torch::kCUDA, 0));

To create the tensor:
torch::from_blob((void*)buffer, {1, 3, 520, 520}, torch::TensorOptions().dtype(torch::kFloat32).device(torch::Device(torch::kCUDA, 0))));

But this fails with this error:
Specified device cuda:0 does not match device of data cuda:-2

I do not understand where this cuda:-2 comes from.
A workaround I found is to create the tensor on CPU and then use tensor.to(cudadevice). But I want to understand why the first one fails.

3 Likes

Did you managed to solve the problem? I met the same error!

No, so far I am still using my workaround

I am also running in same issue. And problem with using to(device) is that it creates a non_leaf tensor, which gives further headache. How did you get the gradients of tensors moved via .to(device) semantics?

You could create the tensor directly on the right device, if possible or you could call .retain_grad() on the intermediate tensors.

1 Like

How? This is still an issue.