Behaviour of torch.nn.Linear is different on GPU (cuda) and CPU

Hello!

I have noticed some strange behaviour with torch.nn.Linear layers when running on GPU (cuda) vs CPU when applied to tensors with zero elements but with non trivial shape. Here is a minimal example.

On CPU:

t = torch.zeros(10, 0)
lin = torch.nn.Linear(0, 13) # This is basically a bias

lin(t).shape = (10,13) as expected.

On GPU:

t = torch.zeros(10, 0).to("cuda")
lin = torch.nn.Linear(0, 13).to("cuda") # This is basically a bias

lin(t).shape = (13) and a warning is raised:

UserWarning: An output with one or more elements was resized since it had shape [n, p], which does not match the required output shape [p]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:24.)

I think this might be happening because of cuda optimisations? But I would at least expect the behaviour to be consistent on GPU and CPU.

Any thoughts?

Thank you!