Thanks for ur inference.
I ll dig into it.
Another potential problem is that I need set all splitted parameters requires_grad = True. But I do not want to compute grad for some of the parameters occasionally.
For example,
param = [param0 param1]
optimizer = torch.optim.SGD(param0, lr = 0.01, momentum=0.9)
I dont want to compute grad for param1. But both of them have requires_grad = True
If I do
loss.backward()
This is related to [requires_grad=True/False dynamically]
However, in my case both param0 and param1 are involed in computing the loss function. Will the grad of param1 also be computed? How can I avoid that?
I realize there is a detach_() may be useful. But there is not an attach() method. [https://github.com/pytorch/pytorch/pull/6561]