Is detach() the opposite of add_param_group?

datacrisis · April 28, 2019, 12:33pm

Hello, I have a question about methods like detach() (and tweaking requires_grad to False), and their relation to add_param_group. Does these 2 functions serve the exact opposite purpose, as one undo the other?

For example, if I build a net that may either freeze or unfreeze different layers for each backward pass during training, would it be suitable to use detach() (or tweak requires_grad) to omit them for frozen layers, and vice versa, using add_param_group to add them back into the optimizer’s param groups?

Thank you!

justusschock · April 28, 2019, 5:22pm

Even though you can use these functions to achieve complementary results, the functions themselves are not complementary at all.
Calling detach() on a tensor returns a new tensor with the same value, device etc. The only difference is, that this tensor is not related to the previous computation graph at all (although it shares memory with the original tensor).
Adding a new param group has nothing to do with the computation graph at all, it only defines which parameters within a graph can be optimized by the optimizer.

datacrisis · April 29, 2019, 3:36am

@justusschock Thanks for the quick reply!

I see, so the underlying mechanics are different. So what if I do the following:

I freeze and unfreeze layers by using the with requires_grad depending on the particular graph topology before each forward-backward pass during the training loop. But when I add new layers back into the graph, I would need to add their parameters back into the optimizer with add_param_group.

Would that work? Also, do I need to take out parameter group from the optimizer when I freeze layers at some particular instance in-between the training loop too? (Opposite of add_param_group) Or is that already tracked and taken care of when I freeze it with requires_grad?

MariosOreo · April 29, 2019, 4:06am

Hello,

According to the source code, if the grad of a parameter is None, it will not update by optim.step.

So you could just set requires_grad=False to the layers you want to freeze.

datacrisis · April 30, 2019, 4:23pm

Ah I see, many thanks @MariosOreo!