RuntimeError: CUDA error: operation would make the legacy stream depend on a capturing blocking stream

I haven’t gotten a lot of info in searching for an error we’ve been experiencing on our systems (once we added some NVidia A40 GPUs to our mix). This error appears when running some simple code like this:

import torch
device=torch.device("cuda")
x = torch.randn(10, 100).to(device)

If we use pytorch 1.8.1-cu113 then there is no issue, any version of pytorch more recent causes this problem. If we use the older V100 GPUs (and pytorch 1.11-cu102) then it works. I’ve tried pytorch 1.9 through the latest with cu113 and none of it works…except 1.8.1

Is there a special flag or option to enable that will make this start working again?

If there’s more info you need on my environment I can let you know. This was using miniconda with python 3.9, and I upgraded to python 3.10 and tried the same stuff and it didn’t work (RHEL8).

Any help someone can offer would be greatly appreciated.

No, there is no special flag and we couldn’t reproduce the issue as described in your cross post. The user provided the information about their setup, installed driver, used PyTorch binary etc. and even a matching system on our side wasn’t able to raise this error.
I would thus recommend to try a clean reinstall of the drivers and to retry it again.