@colesbury from our discussion about when fresh new graphs are created in Pytorch (What does the backward() function do?), what I am truly interested to do correctly is to make sure that the new parameters that I am creating dynamically as the training happens are correctly included in the forward computation correctly. So you seem to imply that what I should do is for each iteration that I do an update to create a new loss function basically. Right? As follows:
## new parameters
add_new_parameters(mdl,W_new)
#making sure the parameters are included
loss = torch.nn.CrossEntropyLoss(reduction='elementwise_mean')
# Reset gradient
optimizer.zero_grad()
# Forward
fx = model.forward(x)
output = loss.forward(fx, y)
# Backward
output.backward()
# Update parameters
optimizer.step()