Hi all,
Maybe someone can clarify my following doubt. I want to add new labels to have a model that I have previously trained, let’s say a resnet with 100 classes. Now, I want to add 10 new classes but I want to check the performance only training the net over the weights and bias of this new 10 classes in the last FC layer. So I have copied the CNN layers as follows:
for key_old, value_old in dict(model_old.named_parameters()).items():
if not ('fc_final' in key_old):
for key, value in dict(self.named_parameters()).items():
if key == key_old:
if value.dim() == 2:
value.data[:self.num_classes_old, :] = value_old.data
value.requires_grad = False
elif value.dim() == 1:
value.data[:self.num_classes_old] = value_old.data
value.requires_grad = False
where _old refers to the previously trained model and no _old the new one. And the last layer, the FC layers has been previously called fc_final. dim 1 and 2 is used to choose between the weights and the bias terms.
For the FC I do something similar but just copying the old terms and allowing gradient:
for key_old, value_old in dict(model_old.named_parameters()).items():
if not ('fc_final' in key_old):
for key, value in dict(self.named_parameters()).items():
if key == key_old:
if value.dim() == 2:
value.data[:self.num_classes_old, :] = value_old.data
value.requires_grad = True
elif value.dim() == 1:
value.data[:self.num_classes_old] = value_old.data
value.requires_grad = True
Here, self.num_classes_old is the number of classes in the previously trained model.
Up to here, there is no problem, everything is working. However, my doubt raises now, I am wondering if there is an option to make required grad False for the previous weights and bias terms and True for the new ones, i.e.
value.data[:self.num_classes_old, :] have requires_grad = False
value.data[self.num_classes_old:, :] have requires_grad = True
Some more info, following the comments at:
I have tried to use detach() and I have changed the
value[:self.num_classes_old] for value[:self.num_classes_old].detach()
however, checking the weights and bias terms I see that they change after training so I do not know why but the detachment is not working. Am I missing something? In fact, I do not understand how detach exactly works, I just know that stops the gradient propagation.
Finally, some other question, from:
I have seen that the use of data is not recommended. Does anybody know why? I am using it a lot and I would like to understand why is not recommended.
Thanks in advance.