When I was using pytorch2.1, I found the following weird situation. Is this a bug?
Python 3.9.13 | packaged by conda-forge | (main, May 27 2022, 16:58:50)
[GCC 10.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.device_count()
8
>>> torch.cuda.is_available()
True
>>> torch.__version__
'2.1.0+cu121'
>>> temp = torch.tensor([[33, 34, 34, 34, 34, 35, 35, 35, 35, 35, 35, 35, 35, 36, 36, 36, 36, 36, 36, 36, 36, 37, 37, 37, 38, 38]], device='cuda:0')
>>> temp
tensor([[33, 34, 34, 34, 34, 35, 35, 35, 35, 35, 35, 35, 35, 36, 36, 36, 36, 36,
36, 36, 36, 37, 37, 37, 38, 38]], device='cuda:0')
>>> temp.to('cuda:1')
tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0]], device='cuda:1')
>>> temp.to('cpu' )
tensor([[33, 34, 34, 34, 34, 35, 35, 35, 35, 35, 35, 35, 35, 36, 36, 36, 36, 36,
36, 36, 36, 37, 37, 37, 38, 38]])
>>> temp.to('cuda:1' )
tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0]], device='cuda:1')
>>> temp.to('cuda:2' )
tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0]], device='cuda:2')
>>> temp.to('cuda:3' )
tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0]], device='cuda:3')
>>> temp.to('cuda:4' )
tensor([[33, 34, 34, 34, 34, 35, 35, 35, 35, 35, 35, 35, 35, 36, 36, 36, 36, 36,
36, 36, 36, 37, 37, 37, 38, 38]], device='cuda:4')
>>> temp.to('cuda:5' )
tensor([[33, 34, 34, 34, 34, 35, 35, 35, 35, 35, 35, 35, 35, 36, 36, 36, 36, 36,
36, 36, 36, 37, 37, 37, 38, 38]], device='cuda:5')
>>> temp.to('cuda:6' )
tensor([[33, 34, 34, 34, 34, 35, 35, 35, 35, 35, 35, 35, 35, 36, 36, 36, 36, 36,
36, 36, 36, 37, 37, 37, 38, 38]], device='cuda:6')
>>> temp.to('cuda:7' )
tensor([[33, 34, 34, 34, 34, 35, 35, 35, 35, 35, 35, 35, 35, 36, 36, 36, 36, 36,
36, 36, 36, 37, 37, 37, 38, 38]], device='cuda:7')
>>>
I measured it under different versions of torch (such as torch2.0.1+cuda118) and different servers A6000 (A6000 Ada), and I found this problem always exists. The test script is as follows:
import torch
gpus = ['cuda:' + str(i) for i in range(8)]
for i in range(8):
cur_device = gpus[i]
temp = torch.tensor([[33, 34, 34, 34, 34, 35, 35, 35, 35, 35, 35, 35, 35, 36, 36, 36, 36, 36, 36, 36, 36, 37, 37, 37, 38, 38]], device=cur_device)
print('>>>>>>> ', temp, ' <<<<<<<')
for j in range(8):
if i == j:
continue
to_device = gpus[j]
_temp = temp.cpu().to(device=to_device)
print(cur_device, ' -> ', to_device)
print(_temp)