Correct way to initialise pretrained model during inference

SU801T · July 29, 2022, 1:29am

I trained a model that does feature extraction and trains the last layer. I then saved this model for use at inference.

In a separate script, I was wondering if initialising my script and setting pretrained=True is also correct at inference as shown in the following:

model = models.densenet161(pretrained=True)
for param in model.parameters():
    param.requires_grad = False
num_ftrs = model.classifier.in_features
model.classifier = torch.nn.Linear(num_ftrs,2)
model.to(device)

# load the best model
bestmodel = get_best_model(best)
bestmodel = torch.load(bestmodel)
model.load_state_dict(bestmodel['classifier'])

# set model to evaluation mode
model.eval()
with torch.no_grad():

ptrblck · July 29, 2022, 4:35am

If you are loading all parameters and buffers from the state_dict (which seems to be the case here) you wouldn’t need to use pretrained=True during the model initialization since all internal parameters/buffers will be replaced.

SU801T · July 29, 2022, 11:22am

Ah ok, but I’m guessing even if it is set to True, once parameters are loaded from the state dict, it should really be the same thing?

SU801T · July 29, 2022, 12:53pm

Which then leads onto my next question. I want to switch off all batch norm/dropout layers during the feature extractor.

I define a feature extraction model like this:


class DenseNetConv(torch.nn.Module):
    def __init__(self):
        super(DenseNetConv,self).__init__()
        original_model = models.densenet161(pretrained=True)
        self.features = torch.nn.Sequential(*list(original_model.children())[:-1])
        self.avgpool = nn.AdaptiveAvgPool2d(1)
        for param in self.parameters():
            param.requires_grad = False

    def forward(self, x):
        x = self.features(x)
        x = F.relu(x, inplace=True)
        x = F.avg_pool2d(x, kernel_size=7).view(x.size(0), -1)
        return x

I then define a separate linear layer model like this:

classifier = nn.Linear(2208, args.num_classes)

and for training I do this:

densenet.eval() # set to eval mode to switch off batch norm layers
densenet.requires_grad_(False)
densenet.to(device)

classifier.to(device)

for epoch in (num_epochs):
    for i, (inputs, labels) in enumerate(dataloaders_dict):
            inputs = inputs.to(device)
            labels = labels.to(device)

            features = densenet(inputs) # extract features

            optimizer.zero_grad()

            # Forward pass to get output/logits
            outputs =  classifier(features)     # pass features into classifier model
           
            # Calculate Loss: softmax --> cross entropy loss
            loss = criterion(outputs, labels)
            total_loss += loss.item()

Then at inference, I repeat the same for feature extraction, however set the classifier model to evaluation.

Would this be appropriate if all I want to do is train the classifier model with no further updates to batch norm/ use of dropout layers in the feature extractor?

I thought it might be a simple way of doing this rather than stating each batch norm layer to be set to eval() …