Freeze the learnable parameters of resnet and attach it to a new network

mderakhshani · March 8, 2017, 8:05am

Hi there,
I have a question about using Resnet18 as feature extractor (no Fine Tuning for its parameter) in my new defined network. Here’s my code:

class TestNet(nn.Module):

def __init__(self, extractor):
    super(TestNet, self).__init__()
    self.features = nn.Sequential(
            # Select Feature
            *list(extractor.children())[:-2]
    )
    self.maxpool1 = nn.MaxPool2d(2,2)
    self.conv1 = nn.Conv2d(512,1024,3,padding=1)
    self.conv2 = nn.Conv2d(1024,512,1)
    self.conv3 = nn.Conv2d(512,1024,3,padding=1)
    self.conv4 = nn.Conv2d(1024,512,1)
    self.conv5 = nn.Conv2d(512,1024,3,padding=1)
    self.final = nn.Conv2d(1024,30,1)
    
def forward(self, input):
    output = self.features(input)
    output = self.maxpool1(output)
    output = self.conv1(output)
    output = self.conv2(output)
    output = self.conv3(output)
    output = self.conv4(output)
    output = self.conv5(output)
    output = f.dropout(output, p = 0.5)
    output = self.final(output)
    output = f.sigmoid(output)
    return output

resnet18 = torchvision.models.resnet18(pretrained=True)
volatile = V(torch.randn(1,3,224,224), volatile=True)
resnet18.eval();
output = resnet18(volatile)

net = TestNet(resnet18)

I would like to know is this approach correct or not? Actually, in my point of view, only the 6 or 8 last layers have learnable parameters. Am I right?

apaszke · March 8, 2017, 4:14pm

If there are parameters you don’t want to optimize you should set their requires_grad flag to False. .eval() only changes the behaviour of modules like dropout or batch norm and should not be enabled during training.

mderakhshani · March 8, 2017, 4:48pm

@apaszke. hey man thanks for your answer. you mean above code is wrong? if it is, so could you help me to repair it? what should i change in above code?

apaszke · March 8, 2017, 4:49pm

for param in net.features.parameters():
    param.requires_grad = False

mderakhshani · March 8, 2017, 4:51pm

@apaszke. Thanks. Could you tell me please about my above source? I would like to repair myself if above is wrong! please!

mderakhshani · March 8, 2017, 4:57pm

@apaszke. But one more question, resnet18 has not got features attribute.
here is my code:

li = resnet.features.parameters()

and here is its error:

AttributeError Traceback (most recent call last)
in ()
----> 1 li = resnet.features.parameters()

/home/mohammad/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py in getattr(self, name)
241 if name in modules:
242 return modules[name]
→ 243 return object.getattribute(self, name)
244
245 def setattr(self, name, value):

AttributeError: ‘ResNet’ object has no attribute ‘features’

albanD · March 8, 2017, 5:10pm

I think he was referring to your custom net that you get with net = TestNet(resnet18) for which you set the .features attribute to be the feature extractor of the resnet.

mderakhshani · March 8, 2017, 5:42pm

@albanD thanks. I have got it!

mderakhshani · March 9, 2017, 8:41am

@albanD, One more question about above code. If I want to define an optimizer, for example an SGD, what should I do for such a defined network? Here is my code to get the parameters of the network and define the optimizer, but some error prompt when definition:

My code:

parameters = net.parameters()
optimizer = optim.SGD(params = parameters, lr = learning_rate, momentum=momentum, weight_decay = weight_decay)

Error:

ValueError Traceback (most recent call last)
in ()
15 label = torch.randn(1,nc,imageSize[0], imageSize[1])
16 parameters = net.parameters()
—> 17 optimizer = optim.SGD(params = parameters, lr = learning_rate, momentum=momentum, weight_decay = weight_decay)

/home/mohammad/anaconda3/lib/python3.6/site-packages/torch/optim/sgd.py in init(self, params, lr, momentum, dampening, weight_decay)
24 defaults = dict(lr=lr, momentum=momentum, dampening=dampening,
25 weight_decay=weight_decay)
—> 26 super(SGD, self).init(params, defaults)
27
28 def step(self, closure=None):

/home/mohammad/anaconda3/lib/python3.6/site-packages/torch/optim/optimizer.py in init(self, params, defaults)
56 "but one of the params is " + torch.typename(param))
57 if not param.requires_grad:
—> 58 raise ValueError("optimizing a parameter that doesn’t "
59 “require gradients”)
60 if param.creator is not None:

ValueError: optimizing a parameter that doesn’t require gradients

So what should I do for repairing this error?
Thanks

albanD · March 9, 2017, 10:06am

Hi,
The problem is that some of the parameters you give the optimizer do not require gradients, and so he don’t know how to handle them.

You can fix this using the ifilter method from python itertools package:

parameters = ifilter(lambda p: p.requires_grad, net.parameters())

mderakhshani · March 9, 2017, 11:01am

@albanD, Thank you. But as a reminder, the ifilter does not exist in Python 3.x.

albanD · March 9, 2017, 11:06am

Ho, good point.
I guess you have to use filterfalse with the opposite condition in Python 3.x

apaszke · March 9, 2017, 7:03pm

Or just filter (which is lazy in Py3)

James_Chen · October 30, 2017, 7:51am

I am wondering whether to set .eval() for those frozen layers since they may still update their running mean and running var during training while not learning their parameters.