Forward and backward pass in ResNet50

Anita · January 30, 2023, 2:19pm

Hi, I had rather a simple question. I have two encoders as below, the first one uses ResNet50 to extract feature maps for each example and the output shape is (batch_size, encoding_vector) e.g. [(32, 1000)].

The second encoder removes the last two layers (pooling and fc) and extracts feature maps keeping the spatial size and outputs [(batch_size, 2048, 19, 19)].

After the feature encoder, I then have the classifier layers.

My question is how does the forward pass and backward pass happens here (in the first encoder and in the second). If it’s the same in both, then are the examples from each batch passed separately to the forward pass or they are passed all of them on the same time?

class My_Network_1(nn.Module):
def init(self):
super(My_Network_1, self).init()

    # Initialize ResNet-50 feature extractor
    self.feature_extractor_1 = torch.hub.load('pytorch/vision:v0.10.0', 'resnext50_32x4d', pretrained=True)

def forward(self, x):

# Input [(batch_size, channel_nr, width, height)]

    h_1 = self.feature_extractor_1(x)
    
    # Output [(batch_size, 1000)]


# classifier Y = ....
   
    return Y, probs

class My_Network_2(nn.Module):
def init(self):
super(My_Network_2, self).init()

# Remove the last two layers from ResNet-50
self.feature_extractor_2 = nn.Sequential(*list(self.feature_extractor.children())[:-2])	

def forward(self, x):

# Input [(batch_size, channel_nr, width, height)]
        
h_2 = self.feature_extractor_2(x)

# Output size [(batch_size, 2048, 19, 19)]


# classifier Y = ....
   
    return Y, probs

ptrblck · January 31, 2023, 2:03am

I assume the classifier is a (block of) linear layers (with activation functions) and tries to map the input features to the output logits?
If so, then note that the output shape would be different in both approaches, since the second one (using the 4-dimensional feature tensor) would “iterate” the additional dimensions.
Here is a small example:

batch_size = 16

# 2D
features = torch.randn(batch_size, 1000)
classifier = nn.Linear(1000, 10)

out = classifier(features)
print(out.shape)
# torch.Size([16, 10])

# 4D
features = torch.randn(batch_size, 2048, 19, 19)
classifier = nn.Linear(19, 10) # the last dimension of features defines the in_features!
out = classifier(features)
print(out.shape)
# torch.Size([16, 2048, 19, 10])

The linear layer using in_features=19 will be applied to all samples from dim0, dim1, and dim2.
You can verify it by flattening the input:

# test
ref = classifier(features.view(-1, 19))
print((out.view(-1, 10) - ref).abs().max())
# tensor(0., grad_fn=<MaxBackward1>)