Pre-trained model and finetuning


(lauren) #1

I am using an off-shelf action recognition system(TSN) that I pretrained on 195 classes. Now I want to finetune it on another dataset that has 25 classes. When I load my pretrained model I do NOT get an error of a mismatch between classes even though I have not changed the last layer. Did the system adopt to the new last layer byitself or I am doing something wrong because I am getting very low accuracy?


#2

Most likely your model didn’t adapt itself to the new number of classes. Since the new dataset contains less classes than the one the model was pre-trained on, you won’t get an out of index error.
However, you are “wasting” model capacity, since the majority of the output neurons aren’t used (neurons corresponding to class26 to class195).
I would recommend to change the last layer corresponding to the new number of classes as I would guess you’ll see a performance boost.


(lauren) #3

Thanks your very much for your fast reply.Do you think the accuracy will increase?


#4

That would be my guess, but I can’t promise anything. :wink:


(lauren) #5

So I added this line model.fc = torch.nn.Linear(195,25) to change the last layer from 195 to 25, but I am getting this error
raise KeyError(‘missing keys in state_dict: “{}”’.format(missing))
KeyError: ‘missing keys in state_dict: “set([‘fc.weight’, ‘fc.bias’])”’


#6

It seems you are using a pre-trained model. If that’s the case, you should load the state_dict with the old architecture and change the last layer afterwards. This will make sure that all parameters will be found.


(lauren) #7

Thank you very much for answering my questions. Now that I froze all the layers except the last fully connected layer and after changing my optimizer from

for group in policies:
    print(('group: {} has {} params, lr_mult: {}, decay_mult: {}'.format(
        group['name'], len(group['params']), group['lr_mult'], group['decay_mult'])))

optimizer = torch.optim.SGD(policies,
                           args.lr,
                           momentum=args.momentum,
                           weight_decay=args.weight_decay)

to

optimizer = torch.optim.SGD(filter(lambda p: p.requires_grad, model.parameters()), args.lr )

I am getting this error

File “main.py”, line 307, in adjust_learning_rate
param_group[‘lr’] = lr * param_group[‘lr_mult’]
KeyError: ‘lr_mult’

Any idea how I can fix it, thank you again.


#8

Could you explain a bit, how you’ve created policies?
Is it a custom dict?


(lauren) #9

I did not create it. I am using an off shelf action recognition system TSN pytorch. They defined it as

policies = model.get_optim_policies()

#10

I’m not familiar with TSN, but apparently policies does not contain the key 'lr_mult'.
Could you check the repo and see if your usage is correct?
If you can’t figure out the problem, I think the best approach would be to create an issue in the TSN repo.


(lauren) #11

before doing my edits, the code worked fine. But when I added my own optimizer it gave me this error.


#12

I guess you are using this repo.
In your code you are filtering out all parameters which do not require gradients, thus you are probably breaking the intended usage of the code.
Have a look at these lines of code.
I think if you create your own policies dict using this code for your filtered parameters, it should work again.