Moving tensor to cuda

Hi, this works, a = torch.LongTensor(1).random_(0, 10).to("cuda"). but this won’t work:

a = torch.LongTensor(1).random_(0, 10)
a.to(device="cuda")

Is this per design, maybe I am simple missing something to convert tensor from CPU to CUDA?

1 Like

If you are pushing tensors to a device or host, you have to reassign them:

a = a.to(device='cuda')

nn.Modules push all parameters, buffers and submodules recursively and don’t need the assignment.

8 Likes

Hi,
I try running this simple block of code:

list_torch = [1,2,3,4]
tenso = torch.tensor(list_torch).view(-1,1)
tenso = tenso.to(device=‘cuda’)

I get this error:
RuntimeError Traceback (most recent call last)
in ()
----> 1 tenso = tenso.to(device=‘cuda’)

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

The code works fine in CPU though, it throws error in COLAB as well as in my local machine

CUDA operations are executed asynchronously, so you are most likely to run into a previous error.
Did you rerun the code with CUDA_LAUNCH_BLOCKING=1 as suggested in the error message?

This code was a part of a larger block, I created a Encoder module with a input layer of size n where as my vocab was n+2, fixing this solved the error.
I was confused as the IDE kept pointing at the cuda conversion lines as the bugs, I adapted the following:

  1. Shifted the model to cpu
  2. used try except and printed the exception string

Although, I could not understand how the layer mismatch gave rise to this error

The CPU can run ahead, since CUDA operations are executed asynchronously in the background.
Unless you are blocking the code via CUDA_LAUNCH_BLOCKING=1, the stack trace will point to the current line of code executed on the host, which is often wrong.
In any case, good to hear you’ve narrowed it down.

@ptrblck , but what if I don’t touch CUDA_LAUNCH_BLOCKING but set non_blocking argument to False?

In this case compute kernels will still be executed asynchronously and you should not blindly trust the stacktrace.

@ptrblck , thanks for reply! I just see (remember I don’t touch CUDA_LAUNCH_BLOCKING and non_blocking=False) that .to(device='cuda') changes the device value to cuda:0 but would increase the GPU’s memory consumption if and only if copy=True. So, is moving to GPU lazy in the copy=False case?

That’s generally not true and only the case when the tensor is already on the desired device. From the docs:

When copy is set, a new Tensor is created even when the Tensor already matches the desired conversion.

Ok, seems like the Google Colab GPU RAM measuring is… strange.