How to combine two gradients in multi-task model?

smth · January 7, 2018, 1:58am

here’s a small test program to verify this:

import torch
from torch.autograd import Variable


# define initial data
a = Variable(torch.randn(10), requires_grad=True)

# b is the parent module
b = a * 2

# rewrap variable to have manual history management here
b_ = Variable(b.data, requires_grad=True)

c = b_ * b_
d = b_ * 4

e = c + d

# do backward in combined way
e.backward(torch.ones(e.size()))
b.backward(b_.grad)
agrad_combined = a.grad.data.clone()


# now reset a's grad
# reset a's grad
a.grad.data.zero_()

# let's do separate way
b = a * 2

b_ = Variable(b.data, requires_grad=True)
c = b_ * b_
d = b_ * 4

c.backward(torch.ones(c.size()))
b.backward(b_.grad)
b_.grad.data.zero_()

d.backward(torch.ones(d.size()))
b.backward(b_.grad)

agrad_separate = a.grad.data.clone()

# print difference between combined method and separate method
print(agrad_combined - agrad_separate)

It prints all zeros.