Hi, I had rather a simple question. I have two encoders as below, the first one uses ResNet50 to extract feature maps for each example and the output shape is (batch_size, encoding_vector) e.g. [(32, 1000)].
The second encoder removes the last two layers (pooling and fc) and extracts feature maps keeping the spatial size and outputs [(batch_size, 2048, 19, 19)].
After the feature encoder, I then have the classifier layers.
My question is how does the forward pass and backward pass happens here (in the first encoder and in the second). If it’s the same in both, then are the examples from each batch passed separately to the forward pass or they are passed all of them on the same time?
class My_Network_1(nn.Module):
def init(self):
super(My_Network_1, self).init()
# Initialize ResNet-50 feature extractor
self.feature_extractor_1 = torch.hub.load('pytorch/vision:v0.10.0', 'resnext50_32x4d', pretrained=True)
def forward(self, x):
# Input [(batch_size, channel_nr, width, height)]
h_1 = self.feature_extractor_1(x)
# Output [(batch_size, 1000)]
# classifier Y = ....
return Y, probs
class My_Network_2(nn.Module):
def init(self):
super(My_Network_2, self).init()
# Remove the last two layers from ResNet-50
self.feature_extractor_2 = nn.Sequential(*list(self.feature_extractor.children())[:-2])
def forward(self, x):
# Input [(batch_size, channel_nr, width, height)]
h_2 = self.feature_extractor_2(x)
# Output size [(batch_size, 2048, 19, 19)]
# classifier Y = ....
return Y, probs