Hi,
I believe that the optimizer assigns None to all those places were you have set requires_grad=False
. As a result, the gradients of all those parameters are not updating. An optimizer works with frozen params even when you do optim = torch.optim.AdamW(model.parameters(), lr=param['lr'], amsgrad=True)
,
as essentially, the params have just requires_grad set as False. My suggestion is unfreeze the params and you will see a difference in training time
For more info, check out this