Torch.prod produces RuntimeError: CUDA driver error: invalid argument

Roee_Shenberg · August 29, 2023, 1:41pm

Just had this happen to me with Python 3.10 in a virtualenv, system has cuda 12.2 installed, virtualenv is pytorch 2.0.1 cu118. Erasing the files in ~/.cache/torch/kernels solved it for me too.

A_Y · October 2, 2023, 3:59pm

Cleaning cache works for me too. But looks like we need to delete that cache each time when we want to run something, otherwise the error still appears.I suppose with this approach we can’t run things in parallel as deleting the cache might impact different runs. Is there any official solutions yet? @ptrblck

HarborYuan · October 8, 2023, 3:01pm

In my case, cleaning the cache does not help. I solved this problem by manually assigning CUDA_HOME env variable (to a same version cuda folder as pytorch installed with).

Hope this can help in your case.

Lu_Lewis · February 4, 2024, 8:55am

Issue solved after remove files under ~/.cache/torch/kernels/.

It’s weird, anybody could explain this code-wise ?

Ensiyeh_Raoufi · May 17, 2024, 3:35am

The same problem for me. Each time after a new cache file is created, the error appears again. How did you solve this issue?

Ensiyeh_Raoufi · May 17, 2024, 3:36am

Could you please describe how I can do it? Thanks

ezyang · August 6, 2024, 2:00pm

The cache directory folks have been removing here is associated with the JITerator. You can also persistently disable this cache with USE_PYTORCH_KERNEL_CACHE=0