RuntimeError: CUDA error: invalid argument second time I run the same code

Hi everyone,

We are using torch to accelerate the computation of some iterative solvers within a bigger project.
The code runs perfectly when using CPU capabilities but when using GPU we run into this runtime error the second time we run the code.

The use of torch is enclosed within a method that is called several times during the whole run and it doesn’t produce problems. The issue appears if we use a script to run this code several times. Then the second time it produces this runtime error the first time it tries to allocate a tensor.

kspace_torch = torch.tensor(kspace, dtype=torch.complex64, device=device)
RuntimeError: CUDA error: invalid argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stack trace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

If we run twice the code in separate calls it does not produce any problem, so we assume it is a problem of not freeing well the resources either for torch or the other parts of the code that use GPU. Does anyone have any clue of what could be happening?

Thank you very much!

can you run your script with CUDA_LAUNCH_BLOCKING=1 and report what the stack-trace looks like?

CUDA errors are often reported on memory allocations even if the true location of the error is not at the allocation – because CUDA is an asynchronous computing model.

Sure, thanks! I did it and it doesn’t change the location of the error:

Traceback (most recent call last):
File “/home/engine/src/edutool/edutool/”, line 488, in spintwin_execution
frame[‘images’] = frame_reconstruction(
File “/home/engine/src/recon/recon/”, line 94, in frame_reconstruction
kspace_torch = torch.tensor(kspace, dtype=torch.complex64, device=device)
RuntimeError: CUDA error: invalid argument
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Also, we log the status of the gpus at the time of the error and they seem to be free:

‘gpus’: [{‘name’: ‘NVIDIA L4’, ‘memory_total’: 23034, ‘memory_free’: 21660, ‘compute_capability’: ‘8.9’}, {‘name’: ‘NVIDIA L4’, ‘memory_total’: 23034, ‘memory_free’: 21660, ‘compute_capability’: ‘8.9’}, {‘name’: ‘NVIDIA L4’, ‘memory_total’: 23034, ‘memory_free’: 20952, ‘compute_capability’: ‘8.9’}]

Could you update PyTorch to the latest nightly release and check if you would still see the error?
We were seeing similar issues in 2.0.0 (should have been fixed in 2.0.1 if I’m not mistaken) for sm_89 devices using the BetterTransformer backend.
If this still doesn’t help, could you post a minimal and executable code snippet reproducing the issue, please?

Thanks for the answer! We’re actually using the 2.0.1 but I also tried with the nightly release and it didn’t help.
I wish I could reproduce it in a minimal code snippet, but this code also runs some wrappers for internal code written in C++ that uses the GPU. The problem must come from the interaction with these wrappers, since we’re not able to reproduce the problem without those calls.

You could then try to narrow down the issue via cuda-gdb or compute-sanitizer, which should hopefully point to the failing kernel.

Thanks a lot! We found that the problem comes from doing a


between calls. Is there any way we could avoid this problem from the torch side? We need to reset the device because of performance in the CUDA side.

Could you explain why kind of “performance in the CUDA side” issues you are seeing that warrants a device reset?
I don’t think you can recover from it on the PyTorch side and would also need to restart the Python process.

Got it, it was actually an error on our side that affected the performance. Thank you for the help!