Hi, I have a dynamic graph that adds/removes layers after some epochs.
I realized that the optimizer is not aware of newly added layer.
Q. Is it safe to call opt.add_param_group() after a few iterations of opt.step() in runtime?
Q. How can I delete some parameters from opt? (To save the memory in runtime)
There was a recommentation to reconstruct the optimizer:
but I think it will lose optimizer states like momentum for existing parameters. (Cloning the optimizer would be the last resort)
@colesbury from our discussion about when fresh new graphs are created in Pytorch (What does the backward() function do?), what I am truly interested to do correctly is to make sure that the new parameters that I am creating dynamically as the training happens are correctly included in the forward computation correctly. So you seem to imply that what I should do is for each iteration that I do an update to create a new loss function basically. Right? As follows:
## new parameters
add_new_parameters(mdl,W_new)
#making sure the parameters are included
loss = torch.nn.CrossEntropyLoss(reduction='elementwise_mean')
# Reset gradient
optimizer.zero_grad()
# Forward
fx = model.forward(x)
output = loss.forward(fx, y)
# Backward
output.backward()
# Update parameters
optimizer.step()