I am trying to train two models on two mutually exclusive portions of a datasets.
Now while I am training both the models, I want to manually extract the Gradients from Model A and Model B, after forward propagation, then before updating the weights, I want to average both the model’s gradients and put the average of the models in both the models, and update the weights.
I am trying it in the following way:
def mergeGrad(modelA, modelB, i):
listGradA = []
listGradB = []
itr = 0
for pA, pB in zip(modelA.parameters(), modelB.parameters()):
listGradA.append(pA)
listGradB.append(pB)
avg = (pA.grad + pB.grad)/2
pA.grad = avg
pB.grad = avg
itr += 1
In the optimization, portion I am doing
optimizerA.zero_grad()
lossA.backward()
optimizerB.zero_grad()
lossB.backward()
print(gradientA, gradientB)
mergeGrad(modelA, modelB, i)
print(gradientA, gradientB)
Now, I am printing before and after averaging and putting the Gradient back. However, only in the very first iteration the first print statement outputs separate gradients for modelA and modelB, and right after merging them, I get the averaged gradients as I want.
However, from the second iteration onward, both the models are always generating the same gradients after forward propagation (and hence the same averaged values). How is this possible?
The models are seeing different data!
Is the manual gradient modification that I am doing is wrong?
Please do suggest if there’s a better way to manually modify and put them back into the model!
Thanks a lot!