Using the output twice

sigma_x · April 2, 2020, 12:24pm

In the following setup all models are nn.Modules, not necessarily stored in different interfaces. What would be a correct way of processing the following setup, in case all models/modules must be updated:

o1 = m1(i)
o2=m2(o1)
o3=m3(o1)
loss = loss1(o2, t1) + loss2(o3,t2)
loss.backward()

Specifically, do I need to detach and clone o1 before using it in o3, or maybe before o2?

EDIT: the reason for asking is when I checked the weights after training, I got an impressions some weights in m3 are not updated/frozen. I checked: all weights have requires_grad=True

b02202050 · April 2, 2020, 1:33pm

Basically, m3 will also have gradients after your loss.backward().
You can try the following code that I try to simulate your situation and you will got gradients on m3:

m1 = torch.tensor(1.).requires_grad_()
m1 = torch.nn.Parameter(m1)
m2 = torch.tensor(2.).requires_grad_()
m2 = torch.nn.Parameter(m2)
m3 = torch.tensor(3.).requires_grad_()
m3 = torch.nn.Parameter(m3)
i = torch.tensor(4.)
o1 = m1 * i
o2 = m2 * o1
o3 = m3 * o1
t1 = 5.
t2 = 6.
loss = (o2 - t1) + (o3 - t2)
loss.backward()
print (m2.grad, m3.grad)

>>> tensor(4.) tensor(4.)

But I can provide some opinion that might cause your zero gradient:

you forget to use optimizer.step() or other way to update your weights.
your learning rate is zero
The gradient of m3 is really calculated to be zero in your algorithm (maybe it’s the problem of loss2 you use that produce a zero gradient).

sigma_x · April 2, 2020, 1:58pm

Thanks. I found the error, there was a bug in the loss computation loop.