Unexpected output size when executing torch.squeeze() with multiple GPUs

The example code is:

# input_size: N * 1024 * 1
output = torch.squeeze(input)
# output_size: N * 1024

where N is batch_size.

  • When the code is running with single GPU mode, the output size is correct, i.e., N * 1024.
  • However, when using multiple GPUs (for example, 4 GPUs), the output size is wired:
    1. When N is large, e.g. 64, the output size is correct.
    2. When N is small, e.g. N=2, the output size will be 2048…

PS, when using output = input.squeeze(-1), the output is always correct.

torch.squeeze will squeeze all dimensions with size 1:

x = torch.randn(1, 2, 1, 2, 1, 2, 1)
> torch.Size([2, 2, 2])

In the case of a small batch size, each device might get a chunk where N==1, which will also squeeze the batch dimension.
To avoid such errors, I would recommend to use your second approach and to specify which dimension should be squeezed.

1 Like