Torch.cat with Tensors of different devices

a = torch.randn(1,2).cuda(1)
b = torch.randn(1,2).cuda(2)
c = torch.cat([a,b])

It works fine ( c.device == 1 )

But Usually, can’t caculate ops between tensors on different device ( i.e. a + b)

And I sees

  // We parallelize the copy if all 6 conditions pass:
  //
  // 1. There is more than one input tensor
  // 2. No empty inputs
  // 3. The result tensor is 32-bit indexable
  // 4. The number of dimensions is <= 4
  // 5. All input tensors are contiguous (output tensor may be non-contig)
  // 6. All input tensors can use 32-bit indexing
  // 7. All input tensors are on the same device

These comments at https://github.com/pytorch/pytorch/blob/c25e33789e45e20d3fa317f60e29bdbe49bffd28/aten/src/THC/generic/THCTensorMath.cu#L131

So, Is concatenate between different devices intended?