I haven’t gotten a lot of info in searching for an error we’ve been experiencing on our systems (once we added some NVidia A40 GPUs to our mix). This error appears when running some simple code like this:
import torch
device=torch.device("cuda")
x = torch.randn(10, 100).to(device)
If we use pytorch 1.8.1-cu113 then there is no issue, any version of pytorch more recent causes this problem. If we use the older V100 GPUs (and pytorch 1.11-cu102) then it works. I’ve tried pytorch 1.9 through the latest with cu113 and none of it works…except 1.8.1
Is there a special flag or option to enable that will make this start working again?
If there’s more info you need on my environment I can let you know. This was using miniconda with python 3.9, and I upgraded to python 3.10 and tried the same stuff and it didn’t work (RHEL8).
Any help someone can offer would be greatly appreciated.