Freeze the learnable parameters of resnet and attach it to a new network

(Mohammad Mehdi Derakhshani) #1

Hi there,
I have a question about using Resnet18 as feature extractor (no Fine Tuning for its parameter) in my new defined network. Here’s my code:

class TestNet(nn.Module):

def __init__(self, extractor):
    super(TestNet, self).__init__()
    self.features = nn.Sequential(
            # Select Feature
    self.maxpool1 = nn.MaxPool2d(2,2)
    self.conv1 = nn.Conv2d(512,1024,3,padding=1)
    self.conv2 = nn.Conv2d(1024,512,1)
    self.conv3 = nn.Conv2d(512,1024,3,padding=1)
    self.conv4 = nn.Conv2d(1024,512,1)
    self.conv5 = nn.Conv2d(512,1024,3,padding=1) = nn.Conv2d(1024,30,1)
def forward(self, input):
    output = self.features(input)
    output = self.maxpool1(output)
    output = self.conv1(output)
    output = self.conv2(output)
    output = self.conv3(output)
    output = self.conv4(output)
    output = self.conv5(output)
    output = f.dropout(output, p = 0.5)
    output =
    output = f.sigmoid(output)
    return output

resnet18 = torchvision.models.resnet18(pretrained=True)
volatile = V(torch.randn(1,3,224,224), volatile=True)
output = resnet18(volatile)

net = TestNet(resnet18)

I would like to know is this approach correct or not? Actually, in my point of view, only the 6 or 8 last layers have learnable parameters. Am I right?

(Adam Paszke) #2

If there are parameters you don’t want to optimize you should set their requires_grad flag to False. .eval() only changes the behaviour of modules like dropout or batch norm and should not be enabled during training.

(Mohammad Mehdi Derakhshani) #3

@apaszke. hey man thanks for your answer. you mean above code is wrong? if it is, so could you help me to repair it? what should i change in above code?

(Adam Paszke) #4
for param in net.features.parameters():
    param.requires_grad = False

(Mohammad Mehdi Derakhshani) #5

@apaszke. Thanks. Could you tell me please about my above source? I would like to repair myself if above is wrong! please! :slight_smile:

(Mohammad Mehdi Derakhshani) #6

@apaszke. But one more question, resnet18 has not got features attribute.
here is my code:

li = resnet.features.parameters()

and here is its error:

AttributeError Traceback (most recent call last)
in ()
----> 1 li = resnet.features.parameters()

/home/mohammad/anaconda3/lib/python3.6/site-packages/torch/nn/modules/ in getattr(self, name)
241 if name in modules:
242 return modules[name]
–> 243 return object.getattribute(self, name)
245 def setattr(self, name, value):

AttributeError: ‘ResNet’ object has no attribute ‘features’

(Alban D) #7

I think he was referring to your custom net that you get with net = TestNet(resnet18) for which you set the .features attribute to be the feature extractor of the resnet.

(Mohammad Mehdi Derakhshani) #8

@albanD thanks. I have got it!

(Mohammad Mehdi Derakhshani) #9

@albanD, One more question about above code. If I want to define an optimizer, for example an SGD, what should I do for such a defined network? Here is my code to get the parameters of the network and define the optimizer, but some error prompt when definition:

My code:

parameters = net.parameters()
optimizer = optim.SGD(params = parameters, lr = learning_rate, momentum=momentum, weight_decay = weight_decay)


ValueError Traceback (most recent call last)
in ()
15 label = torch.randn(1,nc,imageSize[0], imageSize[1])
16 parameters = net.parameters()
—> 17 optimizer = optim.SGD(params = parameters, lr = learning_rate, momentum=momentum, weight_decay = weight_decay)

/home/mohammad/anaconda3/lib/python3.6/site-packages/torch/optim/ in init(self, params, lr, momentum, dampening, weight_decay)
24 defaults = dict(lr=lr, momentum=momentum, dampening=dampening,
25 weight_decay=weight_decay)
—> 26 super(SGD, self).init(params, defaults)
28 def step(self, closure=None):

/home/mohammad/anaconda3/lib/python3.6/site-packages/torch/optim/ in init(self, params, defaults)
56 "but one of the params is " + torch.typename(param))
57 if not param.requires_grad:
—> 58 raise ValueError("optimizing a parameter that doesn’t "
59 “require gradients”)
60 if param.creator is not None:

ValueError: optimizing a parameter that doesn’t require gradients

So what should I do for repairing this error?

How to exclude Embedding layer from Model.parameters()?
(Alban D) #10

The problem is that some of the parameters you give the optimizer do not require gradients, and so he don’t know how to handle them.

You can fix this using the ifilter method from python itertools package:

parameters = ifilter(lambda p: p.requires_grad, net.parameters())

(Mohammad Mehdi Derakhshani) #11

@albanD, Thank you. But as a reminder, the ifilter does not exist in Python 3.x.

(Alban D) #12

Ho, good point.
I guess you have to use filterfalse with the opposite condition in Python 3.x

(Adam Paszke) #13

Or just filter (which is lazy in Py3) :wink:

(James Chen) #14

I am wondering whether to set .eval() for those frozen layers since they may still update their running mean and running var during training while not learning their parameters.