import torch
a=torch.tensor([1,2,3,4,5,6,7,8,9,10]).cuda(0)
print(a)
a=a.cuda(1)
print(a)
When the GPU is 2080, the output is
tensor([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], device='cuda:0')
tensor([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], device='cuda:1')
which meets my expectations. However, when the GPU is 4090, the output is
tensor([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], device='cuda:0')
tensor([ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], device='cuda:1')
It seems that the memory allocation was successful, but the data copy operation did not occur. Is it possible that the 40x0 series GPUs require additional configuration steps?