I am trying to take all the grad values of all the parameters in the model (after doing backward) and calculate the variance of it.
Then I want to update only some of the parameters using this variance.
baseline is a nn.Sequential() module.
Do the following perform the desired task?
optimizer = optim.Adam(model.parameters(), lr=opts['lr'])
cv_optimizer = optim.Adam(model.baseline.parameters(), lr=10*opts['lr'])
#
# Training script
#
optimizer.zero_grad()
cv_optimizer.zero_grad()
loss.backward()
flat_params = []
for param in model.parameters():
if param.grad is None:
print("None found ", param)
flat_params.append(torch.zeros_like(param.data).view(-1))
else:
flat_params.append(param.grad.view(-1)])
flat_params = torch.cat(flat_params, 0)
var_loss = (flat_params**2).mean()
var_loss.backward(model.baseline.parameters())
cv_optimizer.step()
optimizer.step()