I am facing an issue where my batch size of 16 seems to automatically change to 4 batches of 4 when running my code on 4 GPUs, and the output is not returned to me as an output of batch size 16.
Here is the relevant part of my training loop, where I first print the input batch shape, then pass it to my model, and finally print the output shape. The encoder model also prints the input shape as soon as it receives the input (which is giving a mismatch while training on multiple GPUs)
print("Encoder input shape: ", input_tensor.shape)
encoder_output, (encoder_hidden, encoder_cell) = encoder(input_tensor)
decoder_input = torch.squeeze(encoder_hidden, 0)
print("Decoder input shape: ", decoder_input.shape)
class Encoder(nn.Module):
.
.
.
def forward(self, context_panels):
print("Input Context Panels shape: ", context_panels.shape)
encoded_panels = self.panel_encoder(context_panels)
print("Panel encoder output shape: ", encoded_panels.shape)
output, (hidden, cell) = self.sequence_encoder(encoded_panels)
return output, (hidden, cell)
This is the console output I get on CPU training, which is the expected behavior:
Encoder input shape: torch.Size([16, 3, 160, 160])
Input Context Panels shape: torch.Size([16, 3, 160, 160])
Panel encoder output shape: torch.Size([16, 3, 128])
Decoder input shape: torch.Size([16, 128])
However, this is what I get when I run on 4 GPUs:
Encoder input shape: torch.Size([16, 3, 160, 160])
Input Context Panels shape: torch.Size([4, 3, 160, 160])
Input Context Panels shape: torch.Size([4, 3, 160, 160])
Input Context Panels shape: torch.Size([4, 3, 160, 160])
Input Context Panels shape: torch.Size([4, 3, 160, 160])
Panel encoder output shape: torch.Size([4, 3, 128])
Panel encoder output shape: torch.Size([4, 3, 128])
Panel encoder output shape: torch.Size([4, 3, 128])
Panel encoder output shape: torch.Size([4, 3, 128])
Decoder input shape: torch.Size([4, 4, 128])
Any feedback on what I am doing wrong is greatly appreciated!
EDIT:
So the batch size is getting resized currently for the encoder output from an LSTM, but the encoder hidden features are not being reshaped in a similar manner as I expect them to.
CPU output:
Encoder input shape: torch.Size([16, 3, 160, 160])
Input Context Panels shape: torch.Size([16, 3, 160, 160])
Panel encoder output shape: torch.Size([16, 3, 128])
Encoder output shape: torch.Size([16, 3, 128])
Encoder hidden shape: torch.Size([1, 16, 128])
Decoder input shape: torch.Size([16, 128])
4 GPU output:
Encoder input shape: torch.Size([16, 3, 160, 160])
Input Context Panels shape: torch.Size([4, 3, 160, 160])
Input Context Panels shape: torch.Size([4, 3, 160, 160])
Input Context Panels shape: torch.Size([4, 3, 160, 160])
Input Context Panels shape: torch.Size([4, 3, 160, 160])
Panel encoder output shape: torch.Size([4, 3, 128])
Panel encoder output shape: torch.Size([4, 3, 128])
Panel encoder output shape: torch.Size([4, 3, 128])
Panel encoder output shape: torch.Size([4, 3, 128])
Encoder output shape: torch.Size([16, 3, 128])
Encoder hidden shape: torch.Size([4, 4, 128])
Decoder input shape: torch.Size([4, 4, 128])