Currently i have two instances of the same model. modelA and modelB. with different optimizers
i have a training loop where a forward pass happens on each model and gradients are accumulated. (No optim.step() yet).
Now before doing the backward pass i want to sum up the gradients of each model
modelA.gradients = modelA.gradient + modelB.gradients
modelB.gradients = modelA.gradients + modelB.gradients
optimA.step()
optimB.step()
How can i do this in pytorch??
Thanks
More details
# Zero grads
optim1.zero_grad()
optim2.zero_grad()
# Model 1
for batch_idx, (X, y_true) in enumerate(dl1):
X = X.to(device)
y_true = y_true.to(device)
y_pred = model_p1(X)
loss1 = loss_1(y_pred, y_true)
loss1.backward()
# Model 2
for batch_idx, (X, y_true) in enumerate(dl2):
X = X.to(device)
y_true = y_true.to(device)
y_pred = model_p2(X)
loss2 = loss_2(y_pred, y_true)
loss2.backward()
# Combine (SUM gradients)
for pA, pB in zip(model_p1.parameters(), model_p2.parameters()):
sum_grads = pA.grad + pB.grad
pA.grad = sum_grads
pB.grad = sum_grads.clone()
# Update model parameters using summed gradients
optim1.step()
optim2.step()
This is what i currently have but it dosen’t seem to be working
for pA, pB in zip(model_p1.parameters(), model_p2.parameters()):
sum_grads = pA.grad + pB.grad
pA.grad = sum_grads
pB.grad = sum_grads.clone()
I want to update parameters based on summed gradients.
Thanks