Correct way to initialise pretrained model during inference

I trained a model that does feature extraction and trains the last layer. I then saved this model for use at inference.

In a separate script, I was wondering if initialising my script and setting pretrained=True is also correct at inference as shown in the following:

model = models.densenet161(pretrained=True)
for param in model.parameters():
    param.requires_grad = False
num_ftrs = model.classifier.in_features
model.classifier = torch.nn.Linear(num_ftrs,2)
model.to(device)

# load the best model
bestmodel = get_best_model(best)
bestmodel = torch.load(bestmodel)
model.load_state_dict(bestmodel['classifier'])

# set model to evaluation mode
model.eval()
with torch.no_grad():

If you are loading all parameters and buffers from the state_dict (which seems to be the case here) you wouldn’t need to use pretrained=True during the model initialization since all internal parameters/buffers will be replaced.

1 Like

Ah ok, but I’m guessing even if it is set to True, once parameters are loaded from the state dict, it should really be the same thing?

Which then leads onto my next question. I want to switch off all batch norm/dropout layers during the feature extractor.

I define a feature extraction model like this:


class DenseNetConv(torch.nn.Module):
    def __init__(self):
        super(DenseNetConv,self).__init__()
        original_model = models.densenet161(pretrained=True)
        self.features = torch.nn.Sequential(*list(original_model.children())[:-1])
        self.avgpool = nn.AdaptiveAvgPool2d(1)
        for param in self.parameters():
            param.requires_grad = False

    def forward(self, x):
        x = self.features(x)
        x = F.relu(x, inplace=True)
        x = F.avg_pool2d(x, kernel_size=7).view(x.size(0), -1)
        return x

I then define a separate linear layer model like this:

classifier = nn.Linear(2208, args.num_classes)

and for training I do this:

densenet.eval() # set to eval mode to switch off batch norm layers
densenet.requires_grad_(False)
densenet.to(device)

classifier.to(device)

for epoch in (num_epochs):
    for i, (inputs, labels) in enumerate(dataloaders_dict):
            inputs = inputs.to(device)
            labels = labels.to(device)

            features = densenet(inputs) # extract features

            optimizer.zero_grad()

            # Forward pass to get output/logits
            outputs =  classifier(features)     # pass features into classifier model
           
            # Calculate Loss: softmax --> cross entropy loss
            loss = criterion(outputs, labels)
            total_loss += loss.item()

Then at inference, I repeat the same for feature extraction, however set the classifier model to evaluation.

Would this be appropriate if all I want to do is train the classifier model with no further updates to batch norm/ use of dropout layers in the feature extractor?

I thought it might be a simple way of doing this rather than stating each batch norm layer to be set to eval() …