Copy a model's gradients to another model

rkzhou · December 2, 2022, 11:09pm

Hi all,

I am trying to implement the communication of gradients between two models with same architecture, like copying model A’s gradients to model B.

In model A, I use:
gradients = {}
inputs = inputs.to(device)
labels = labels.to(device)
outputs = modelA(inputs)
optimizerA.zero_grad()
loss = criterion(outputs, labels)
loss.backward()
for name, param in modelA.named_parameters():
gradients[name] = param.grad.clone()

In model B, I use:
optimizerB.zero_grad()
for name, param in modelB.named_parameters():
param.grad = gradients[name]
optimizerB.step()

I am not sure if there is any error in this part, but the testing accuracy in cifar10 dataset remains about 10% ~ 15% after 50 epochs. Really need help from someone.

ptrblck · December 3, 2022, 1:01am

The core looks correct (at least I don’t see any obvious issues).

Could you explain why this approach should work at all?
Assuming both models are using different parameters, I would expect to see a failure in modelB’s training if modelA’s gradients are used.

rkzhou · December 3, 2022, 1:33am

Sorry, I forgot to mention that I would synchronize the weights of these two parameters after each epoch training, by passing modelB’s state_dict to modelA, and modelA will load the state_dict. So, if modelA can converge to a minimum point of the loss function, modelB could converge to the same point. If my mind is correct, I should that should work. I am sure that modelA can reach at least 70% testing accuracy if I only train it solely.

By the way, I define modelA and modelB, optimizerA and optimizerB as global variables in two different python files. Would that be the possible problem?