I’m attempting feature extraction in an unorthodox way. I extract features in eval() mode to switch off the batch norm and dropout layers and use the running means and std provided by ImageNet.
I use a feature extractor to extract features from two related images and concatenate the two tensors stackwise before passing through a linear dense classifier model for training. I’m wondering whether I can avoid using
with torch.no_grad() as the two models are unrelated.
Here is a simplified version:
num_classes = 2 num_epochs = 10 criterion = nn.CrossEntropyLoss().to(device) optimizer = torch.optim.Adam(classifier.parameters(), lr=0.001) densenet= DenseNetConv() densenet.eval() # set densenet to eval to switch off batch norm and dropout layers densenet.to(device) classifier = nn.Linear(4416, num_classes) classifier.to(device) for epoch in range(num_epochs): classifier.train() for i, (inputs_1, inputs_2, labels) in enumerate(dataloaders_dict['train']): inputs_1= inputs_1.to(device) inputs_2 = inputs_2.to(device) labels = labels.to(device) features_1 = densenet(inputs_1) # extract features 1 features_2 = densenet(inputs_2) # extract features 2 combined = torch.cat([features_1, features_2], dim=1) # combine features combined = combined(-1, 4416) # reshape optimizer.zero_grad() # Forward pass to get output/logits outputs = classifier(combined) # Calculate Loss: softmax --> cross entropy loss loss = criterion(outputs, labels) _, pred = torch.max(outputs, 1) equality_check = (labels.data == pred) # Getting gradients w.r.t. parameters loss.backward() optimizer.step()
As you can see, I do not call
with torch.no_grad(), despite having
densenet.eval() as my separate feature extractor. Is there an issue with the way this is implemented or can I assume that this will not interfere with the