I am using a Faster RCNN model with Resnet50 and FPN backbone. These are the parameters relevant to this question:
 Minibatch size = 6 RGB images
 Number of GPUs = 1
 Input minibatch tensor size = torch.Size([6, 3, 768, 1184])
According to the Resnet50 architecture, the first convolution layer is defined as follows: .conv1 = Conv2d( 3, 64, kernel_size=7, stride=2, padding=3, bias=False, norm=get_norm(norm, out_channels), )
After the model runs this first convolution, conv1 tensor size is torch.Size([6, 64, 384, 592]).
I have four questions:

As per conv1 tensor size, do we have 64 feature maps for each image or are they averaged and shared over the 6 images?

How does the convolution process takes place between the input and conv1, given that we have 6 images?

Do each of the 64 filters apply over RGB channels of an image? Or do they apply directly over the colored pixel values without splitting into respective channels?

I understand that at the end of a forward pass, the loss and gradients are averaged by mini_batch_size. Are the weights updated for each image iteratively?