Pre-trained model and finetuning

mab73 · February 5, 2019, 10:46am

I am using an off-shelf action recognition system(TSN) that I pretrained on 195 classes. Now I want to finetune it on another dataset that has 25 classes. When I load my pretrained model I do NOT get an error of a mismatch between classes even though I have not changed the last layer. Did the system adopt to the new last layer byitself or I am doing something wrong because I am getting very low accuracy?

ptrblck · February 5, 2019, 10:52am

Most likely your model didn’t adapt itself to the new number of classes. Since the new dataset contains less classes than the one the model was pre-trained on, you won’t get an out of index error.
However, you are “wasting” model capacity, since the majority of the output neurons aren’t used (neurons corresponding to class26 to class195).
I would recommend to change the last layer corresponding to the new number of classes as I would guess you’ll see a performance boost.

mab73 · February 5, 2019, 11:03am

Thanks your very much for your fast reply.Do you think the accuracy will increase?

ptrblck · February 5, 2019, 11:14am

That would be my guess, but I can’t promise anything.

mab73 · February 5, 2019, 12:13pm

So I added this line model.fc = torch.nn.Linear(195,25) to change the last layer from 195 to 25, but I am getting this error
raise KeyError(‘missing keys in state_dict: “{}”’.format(missing))
KeyError: ‘missing keys in state_dict: “set([‘fc.weight’, ‘fc.bias’])”’

ptrblck · February 5, 2019, 12:39pm

It seems you are using a pre-trained model. If that’s the case, you should load the state_dict with the old architecture and change the last layer afterwards. This will make sure that all parameters will be found.

mab73 · February 6, 2019, 10:41am

Thank you very much for answering my questions. Now that I froze all the layers except the last fully connected layer and after changing my optimizer from

for group in policies:
    print(('group: {} has {} params, lr_mult: {}, decay_mult: {}'.format(
        group['name'], len(group['params']), group['lr_mult'], group['decay_mult'])))

optimizer = torch.optim.SGD(policies,
                           args.lr,
                           momentum=args.momentum,
                           weight_decay=args.weight_decay)

to

optimizer = torch.optim.SGD(filter(lambda p: p.requires_grad, model.parameters()), args.lr )

I am getting this error

File “main.py”, line 307, in adjust_learning_rate
param_group[‘lr’] = lr * param_group[‘lr_mult’]
KeyError: ‘lr_mult’

Any idea how I can fix it, thank you again.

ptrblck · February 6, 2019, 12:39pm

Could you explain a bit, how you’ve created policies?
Is it a custom dict?

mab73 · February 6, 2019, 12:54pm

I did not create it. I am using an off shelf action recognition system TSN pytorch. They defined it as

policies = model.get_optim_policies()

ptrblck · February 6, 2019, 1:25pm

I’m not familiar with TSN, but apparently policies does not contain the key 'lr_mult'.
Could you check the repo and see if your usage is correct?
If you can’t figure out the problem, I think the best approach would be to create an issue in the TSN repo.

mab73 · February 6, 2019, 1:39pm

before doing my edits, the code worked fine. But when I added my own optimizer it gave me this error.

ptrblck · February 6, 2019, 1:58pm

I guess you are using this repo.
In your code you are filtering out all parameters which do not require gradients, thus you are probably breaking the intended usage of the code.
Have a look at these lines of code.
I think if you create your own policies dict using this code for your filtered parameters, it should work again.

Yang_Zheng · July 11, 2019, 12:30pm

Have you finished the problem? i would be appreciated it if you could share with me.