Multiple input in VGG16

Hello everyone!
I wanted to ask how could I add a first layer to the pre-trained vgg16 in a way that my input to the network would be 3 RGB images, so my input shape would be 500x500x3x3 ! I only was able to see how to change this first layer to change the dimension of the image, for example from 500x500x3 to 500x500x1, but I don’t know how to add that 4th dimension.

Thank you in advance

How should this layer work for 3 input images?
I.e. do you want to apply a nn.Conv3d layer and treat the multiple images as the depth dimension or apply an nn.Conv2d layer for each image?
In the latter case, how would the output be defined?