lets assume a tensor x = [ [ 2, 2], [2, 2] ] in CPU
is there a touch scatter operation that puts the divided data into 2 GPU
like x1 = [2,2] in GPU 0
and x2 = [2,2] in GPU 1
?
torch dist seems to keep the full tensor in GPU 0 then scatter to other GPUs?