Moving tensor to cuda

Intel_Novel · March 8, 2019, 9:07pm

Hi, this works, a = torch.LongTensor(1).random_(0, 10).to("cuda"). but this won’t work:

a = torch.LongTensor(1).random_(0, 10)
a.to(device="cuda")

Is this per design, maybe I am simple missing something to convert tensor from CPU to CUDA?

ptrblck · March 8, 2019, 10:09pm

If you are pushing tensors to a device or host, you have to reassign them:

a = a.to(device='cuda')

nn.Modules push all parameters, buffers and submodules recursively and don’t need the assignment.

Swarnadeep_Bhar · June 29, 2021, 10:47am

Hi,
I try running this simple block of code:

list_torch = [1,2,3,4]
tenso = torch.tensor(list_torch).view(-1,1)
tenso = tenso.to(device=‘cuda’)

I get this error:
RuntimeError Traceback (most recent call last)
in ()
----> 1 tenso = tenso.to(device=‘cuda’)

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

The code works fine in CPU though, it throws error in COLAB as well as in my local machine

ptrblck · June 29, 2021, 8:55pm

CUDA operations are executed asynchronously, so you are most likely to run into a previous error.
Did you rerun the code with CUDA_LAUNCH_BLOCKING=1 as suggested in the error message?

Swarnadeep_Bhar · June 30, 2021, 5:39am

This code was a part of a larger block, I created a Encoder module with a input layer of size n where as my vocab was n+2, fixing this solved the error.
I was confused as the IDE kept pointing at the cuda conversion lines as the bugs, I adapted the following:

Shifted the model to cpu
used try except and printed the exception string

Although, I could not understand how the layer mismatch gave rise to this error

ptrblck · June 30, 2021, 5:53am

The CPU can run ahead, since CUDA operations are executed asynchronously in the background.
Unless you are blocking the code via CUDA_LAUNCH_BLOCKING=1, the stack trace will point to the current line of code executed on the host, which is often wrong.
In any case, good to hear you’ve narrowed it down.

kuraga · November 23, 2023, 12:20pm

@ptrblck , but what if I don’t touch CUDA_LAUNCH_BLOCKING but set non_blocking argument to False?

ptrblck · November 23, 2023, 12:31pm

In this case compute kernels will still be executed asynchronously and you should not blindly trust the stacktrace.

kuraga · November 23, 2023, 12:50pm

@ptrblck , thanks for reply! I just see (remember I don’t touch CUDA_LAUNCH_BLOCKING and non_blocking=False) that .to(device='cuda') changes the device value to cuda:0 but would increase the GPU’s memory consumption if and only if copy=True. So, is moving to GPU lazy in the copy=False case?

ptrblck · November 23, 2023, 12:55pm

That’s generally not true and only the case when the tensor is already on the desired device. From the docs:

When copy is set, a new Tensor is created even when the Tensor already matches the desired conversion.

kuraga · November 23, 2023, 10:15pm

Ok, seems like the Google Colab GPU RAM measuring is… strange.