Data transferring bettween GPUs

import torch
a=torch.tensor([1,2,3,4,5,6,7,8,9,10]).cuda(0)
print(a)
a=a.cuda(1)
print(a)

When the GPU is 2080, the output is

tensor([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10], device='cuda:0')
tensor([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10], device='cuda:1')

which meets my expectations. However, when the GPU is 4090, the output is

tensor([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10], device='cuda:0')
tensor([ 0,  0,  0,  0,  0,  0,  0,  0,  0, 0], device='cuda:1')

It seems that the memory allocation was successful, but the data copy operation did not occur. Is it possible that the 40x0 series GPUs require additional configuration steps?

Disable p2p support on your 40xx series or update your NVIDIA driver as an older driver allowed the usage of the unsupported p2p connectivity and could thus yield to data corruption.