You should double check your padding approach as you are changing the channels, while I would guess you want to manipulate the spatial size.
Shapes before applying F.pad:
print(x1.shape, x2_1.shape, x2_2.shape, x3_1.shape, x3_2.shape, x4.shape)
# torch.Size([8, 128, 258, 258]) torch.Size([8, 42, 258, 258]) torch.Size([8, 42, 258, 258]) torch.Size([8,
21, 258, 258]) torch.Size([8, 21, 258, 258]) torch.Size([8, 42, 258, 258])
After:
print(x1.shape, x2_1.shape, x2_2.shape, x3_1.shape, x3_2.shape, x4.shape)
# torch.Size([8, 42, 258, 258]) torch.Size([8, 42, 258, 258]) torch.Size([8, 42, 258, 258]) torch.Size([8, 42, 258, 258]) torch.Size([8, 42, 258, 258]) torch.Size([8, 42, 258, 258])
If this is desired, change deconv1_input_channels = 6 * (inception_out_channels // 3) and it should work.