How to use a network with Batchnorm on eval mode as a feature extractor for a different network

So I have a network, lets call that model1. This is the model I want to train.
Similarity to Style Transfer, I want to use a different network (let’s call it model2, in style transfer model2 = VGG) as a feature extractor for the loss function.

So I put model2 in eval mode while training model1. However, even though I put model2’s parameters requires_grad= False, since it’s in eval mode i get the error “cudnn RNN backward can only be called in training mode”.

Is there a way to train model1 while model2 is in eval mode as a feature extractor?

Thanks

I think you just need to put requires_grad=False and you are good to go. No model.eavl() is needed. check this.

Hi,

What you said about requires_grad.False is true and as @Isaac_Kargar mentioned, you should not put model in eval mode. Putting model in eval mode, does not affect autograd engine and just uses eval mode in layers like dropout or batchnorm which in case of extracting features, it is not desired. (see this post)
Here is snippet that uses VGG16 with batch norm as feature extractor and a simple 1 layer linear model as model1 in your case. And it works just fine although the model is completely nonsense and I used simple code to demonstrate the idea.

import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision.models import vgg16_bn
from torch import optim
# a model that gets another model as feature_extractor
class Model(nn.Module):
    def __init__(self, feature_extractor):
        super(Model, self).__init__()
        self.feature_extractor = feature_extractor  # model2 in your case
        self.layer = nn.Linear(3*256*256, 1000,)  # model1 in your case
    def forward(self, x):
        features = self.feature_extractor(x)
        x = x.view(5, -1)
        x = self.layer(x)
        x += features  # using features from model2
        return x


model2 = vgg16_bn(pretrained=True).eval()
for param in model2.parameters():
    param.requires_grad = False
model1 = Model(feature_extractor=model2)
model1.train()

criterion = nn.L1Loss()
# note we only pass parameters of model1 to optimizer (possible mistake?)
optimizer = optim.SGD(model1.parameters(), lr=0.001, momentum=0.9) 

running_loss = 0.0
for i in range(3):
    x = torch.randn(5, 3, 256, 256)  # consider this as inputs that every batch changes
    optimizer.zero_grad()

    with torch.set_grad_enabled(True):
        outputs = model1(x)
        loss = criterion(outputs, torch.ones(outputs.shape))  # a weird loss!

        loss.backward()
        optimizer.step()

    running_loss += loss.item() * x.size(0)
    print(running_loss)

Here is a elaborate post about using pretrained models as feature extractors.

bests

Won’t it impact performance if I’m using batchnorm on training mode?

@Nikronic I might have not explained myself well / used the wrong terminology. When I wrote feature extractor I meant using the VGG on the outputs of model 1. for example:

model1.train()
y = model1(x)
features = model2(y)
features_truth = model2(y_ground_truth)
loss_on_deep_features = criterion(features, features_truth)

I wondered if using model2 in train mode will hurt the performance (give me incorrect features of x and x_truth). When i tried putting model2 in eval mode i got the above error, even though I put requires_grad = False for model2.
Anyway thank you for your answer and time!

you can test and see the performance.

Something I think I have to mention is that, setting model.train or model.eval won’t effect on autograd. In fact, for instance, dropout layers have to be turned off in eval mode. So, you have to use model.eval but if you do not set required_grad= off, it still affects training and computes this values.

In you case, you do not need to put model 2 in eval mode but you have to turn gradients off (if you don’t it will affect the model1’s parameters too.)

Here is the new snippet that uses model2 on model1’s input:

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.layer = nn.Conv2d(3, 3, 3, 1, 1)
    def forward(self, x):
        x = self.layer(x)
        return x

model2 = vgg16_bn(pretrained=True).eval()
for param in model2.parameters():
    param.requires_grad = False
model1 = Model()
model1.train()


criterion = nn.L1Loss()
optimizer = optim.SGD(model1.parameters(), lr=0.001, momentum=0.9)

running_loss = 0.0
for i in range(3):
    x = torch.randn(5, 3, 256, 256)
    # zero the parameter gradients
    optimizer.zero_grad()

    # forward
    # track history if only in train
    with torch.set_grad_enabled(True):
        outputs = model1(x)
        outputs = model2(outputs)
        loss = criterion(outputs, torch.ones(outputs.shape))

        # backward + optimize only if in training phase
        loss.backward()
        optimizer.step()

    # statistics
    running_loss += loss.item()
    print(running_loss)
1 Like