Hi,
I was training a network using a single gpu, alright.
As the gpu utilization was bit low I decided to do the preprocessing in a second gpu allocating tensors in dataset’s ‘getitem’ and working on the main thread.
Evberything ok.
Then I realized that when I move my ground-truth from cuda:1 to cuda:0 the tensor totally changes to a completely different one.
(Trying to reproduce but console crashed not freezing gpu memory… )
Any idea?
I don’t have an idea yet, but it sounds like a bug so we would need to dig into it.
Could you explain your use case a bit so that we might try to work on a reproduction as well?
Well it’s nothing out of the box.
I could reproduce something similar just by using the console.
import torch
a=(torch.rand(70,1,256,256)>0.5).float().cuda(1)
b=a.cuda() # Same effect with a.to('cuda:0')
After restarting the computer and running the aforementioned code B is all zeros.
Originally I discovered the issue using following a "complex pipeline which involves STFT, functional.grid_sample and einsum but I think that’s irrelevant.
Originally the tensor was stored in a dictionary
vars['gt']=vars['gt'].to(torch.device(0))
Afterwards vars[‘gt’] became a tensor bounded between -1 and 5 at gpu 0.
Even trying
import torch
a=(torch.rand(70,1,256,256)>0.5).float().cuda(1)
b=a.cuda() # Same effect with a.to('cuda:0')
c=b.cpu()
just to force it to be on cpu (avoiding sync problems). Neither using torch.cuda.synchronize() between commands.
Torch version is 1.2.0 according to torch.version
cuda version is 10.0.130
ipython 3.6.8, default version
Nvidia-smi 410.48
Cuda0: Quadro P6000
Cuda1: Gefore GTX 1080 Ti
Could you update PyTorch to the latest version, please, and retry the code?
If I’m not mistaken, we’ve seen a similar issue some time ago, which boiled down to a hardware issue, but I can’t find the post.
Maybe @albanD remembers it.
Hi @ptrblck, @albanD
It keeps happening with pytorch 1.4 and driver 440.
I discovered something iteresting.
It does happed going from GTX1080Ti to P6000 but not the other way around.
Fail to import hypothesis in common_utils, tests are not derandomized
test_foreach raises
AttributeError: module 'torch' has no attribute '_foreach_add'
ant test_torch
File "test_torch.py", line 29, in <module>
from torch.testing._internal.common_utils import TestCase, iter_indices, TEST_NUMPY, TEST_SCIPY, \
ImportError: cannot import name 'wrapDeterministicFlagAPITest'
This kind of error AttributeError: module 'torch' has no attribute '_foreach_add' where you’re missing a cpp API usually means that your python code for your install is not the same version as the bianry install.
That happens if you do setup.py develop and then update your local repo.
You might want to clean up your install here.
If you’re using a binary install, that means that you either have both a binary and develop install at the same time that conflict. Or you have a folder called torch in your current directory.
I’m experiencing this now on torch 1.13.1 on a K80. (the newer driver for torch 2 doesn’t support the K80 (edit: i guess this was a quirk of the image with newer nccl, maybe i will try torch 2).)
I have the tests running. It’s a little confusing because pytorch docker images can come with torch installed via conda, which can clash with a source install.
It seems ideally there’d be a way to log the cuda calls and narrow the problem down further.
I have another K80 I can swap in to try, too, although they’ve both weathered the same storms and could be broken similarly if there’s damage.
I encountered a similar issue when I was using a machine with multiple RTX4090 GPUs.
It can be reproduced simply by running this snippet on the console:
>>> l = torch.tensor(1, device='cuda:0')
>>> l.to('cuda:1')
tensor(0, device='cuda:1')
It happens for both pytorch 2.0.0 and 1.10.0 on two RTX4090s, but it does NOT happen on two RTX3090 GPUs (pytorch 1.10.0).
The issue can temporarily be solved by moving to CPU first and then to the second GPU:
>>> l = torch.tensor(1, device='cuda:0')
>>> l.to('cpu').to('cuda:1')
tensor(1, device='cuda:1')