I trained a model that does feature extraction and trains the last layer. I then saved this model for use at inference.
In a separate script, I was wondering if initialising my script and setting pretrained=True is also correct at inference as shown in the following:
model = models.densenet161(pretrained=True)
for param in model.parameters():
param.requires_grad = False
num_ftrs = model.classifier.in_features
model.classifier = torch.nn.Linear(num_ftrs,2)
model.to(device)
# load the best model
bestmodel = get_best_model(best)
bestmodel = torch.load(bestmodel)
model.load_state_dict(bestmodel['classifier'])
# set model to evaluation mode
model.eval()
with torch.no_grad():
If you are loading all parameters and buffers from the state_dict (which seems to be the case here) you wouldn’t need to use pretrained=True during the model initialization since all internal parameters/buffers will be replaced.
Which then leads onto my next question. I want to switch off all batch norm/dropout layers during the feature extractor.
I define a feature extraction model like this:
class DenseNetConv(torch.nn.Module):
def __init__(self):
super(DenseNetConv,self).__init__()
original_model = models.densenet161(pretrained=True)
self.features = torch.nn.Sequential(*list(original_model.children())[:-1])
self.avgpool = nn.AdaptiveAvgPool2d(1)
for param in self.parameters():
param.requires_grad = False
def forward(self, x):
x = self.features(x)
x = F.relu(x, inplace=True)
x = F.avg_pool2d(x, kernel_size=7).view(x.size(0), -1)
return x
I then define a separate linear layer model like this:
classifier = nn.Linear(2208, args.num_classes)
and for training I do this:
densenet.eval() # set to eval mode to switch off batch norm layers
densenet.requires_grad_(False)
densenet.to(device)
classifier.to(device)
for epoch in (num_epochs):
for i, (inputs, labels) in enumerate(dataloaders_dict):
inputs = inputs.to(device)
labels = labels.to(device)
features = densenet(inputs) # extract features
optimizer.zero_grad()
# Forward pass to get output/logits
outputs = classifier(features) # pass features into classifier model
# Calculate Loss: softmax --> cross entropy loss
loss = criterion(outputs, labels)
total_loss += loss.item()
Then at inference, I repeat the same for feature extraction, however set the classifier model to evaluation.
Would this be appropriate if all I want to do is train the classifier model with no further updates to batch norm/ use of dropout layers in the feature extractor?
I thought it might be a simple way of doing this rather than stating each batch norm layer to be set to eval() …