Hi, this works, a = torch.LongTensor(1).random_(0, 10).to("cuda")
. but this won’t work:
a = torch.LongTensor(1).random_(0, 10)
a.to(device="cuda")
Is this per design, maybe I am simple missing something to convert tensor from CPU to CUDA?
Hi, this works, a = torch.LongTensor(1).random_(0, 10).to("cuda")
. but this won’t work:
a = torch.LongTensor(1).random_(0, 10)
a.to(device="cuda")
Is this per design, maybe I am simple missing something to convert tensor from CPU to CUDA?
If you are pushing tensors to a device or host, you have to reassign them:
a = a.to(device='cuda')
nn.Module
s push all parameters, buffers and submodules recursively and don’t need the assignment.
Hi,
I try running this simple block of code:
list_torch = [1,2,3,4]
tenso = torch.tensor(list_torch).view(-1,1)
tenso = tenso.to(device=‘cuda’)
I get this error:
RuntimeError Traceback (most recent call last)
in ()
----> 1 tenso = tenso.to(device=‘cuda’)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
The code works fine in CPU though, it throws error in COLAB as well as in my local machine
CUDA operations are executed asynchronously, so you are most likely to run into a previous error.
Did you rerun the code with CUDA_LAUNCH_BLOCKING=1
as suggested in the error message?
This code was a part of a larger block, I created a Encoder module with a input layer of size n where as my vocab was n+2, fixing this solved the error.
I was confused as the IDE kept pointing at the cuda conversion lines as the bugs, I adapted the following:
Although, I could not understand how the layer mismatch gave rise to this error
The CPU can run ahead, since CUDA operations are executed asynchronously in the background.
Unless you are blocking the code via CUDA_LAUNCH_BLOCKING=1
, the stack trace will point to the current line of code executed on the host, which is often wrong.
In any case, good to hear you’ve narrowed it down.
In this case compute kernels will still be executed asynchronously and you should not blindly trust the stacktrace.
@ptrblck , thanks for reply! I just see (remember I don’t touch CUDA_LAUNCH_BLOCKING
and non_blocking=False
) that .to(device='cuda')
changes the device
value to cuda:0
but would increase the GPU’s memory consumption if and only if copy=True
. So, is moving to GPU lazy in the copy=False
case?
That’s generally not true and only the case when the tensor is already on the desired device. From the docs:
When copy is set, a new Tensor is created even when the Tensor already matches the desired conversion.
Ok, seems like the Google Colab GPU RAM measuring is… strange.