Thanks,
The print statements and the output on the console is:
def forward(self, input, input_lengths, images):
ptvsd.break_into_debugger()
print("Input", input.shape, input.get_device())
input = input.permute((1, 0)) #input.reshape((input.shape[1], input.shape[0]))
print("Permuted Input", input.shape, input.get_device())
embedded = self.embedding(input)
print("Embedded", embedded.shape, embedded.get_device())
#packed = torch.nn.utils.rnn.pack_padded_sequence(embedded, input_lengths)
output = self.transformer_encoder_layer(embedded)
print("Output", output.shape, output.get_device())
#output, _ = torch.nn.utils.rnn.pad_packed_sequence(output)
image_encodings = self.encoder_cnn(images)
print("Image Encodings", image_encodings.shape, image_encodings.get_device())
return output, image_encodings
And the output is:
Input torch.Size([64, 30]) 0
Input torch.Size([64, 30]) 1
Permuted Input torch.Size([30, 64]) 0
Input torch.Size([64, 30]) 2
Permuted Input torch.Size([30, 64]) 1
Permuted Input torch.Size([30, 64]) 2
Input torch.Size([64, 30]) 3
Embedded torch.Size([30, 64, 300]) 0
Embedded torch.Size([30, 64, 300]) 2
Embedded torch.Size([30, 64, 300]) 1
Output torch.Size([30, 64, 300]) 1
Output torch.Size([30, 64, 300]) 0
Output torch.Size([30, 64, 300]) 2
Permuted Input torch.Size([30, 64]) 3
Image Encodings torch.Size([64, 300]) 1
Image Encodings Image Encodingstorch.Size([64, 300]) 2
torch.Size([64, 300]) 0
Embedded torch.Size([30, 64, 300]) 3
Output torch.Size([30, 64, 300]) 3
Image Encodings torch.Size([64, 300]) 3
And the returned tensors have the shape:
shape:torch.Size([120, 64, 300])
device:device(type='cuda', index=0)
and
shape:torch.Size([256, 300])
device:device(type='cuda', index=0)