Consider a network (i) -> (h) -> (o)
where i
, h
, o
, are input, hidden, and output layers, respectively.
I would like to associate a loss Lh
and a loss Lo
to layers h
and o
, respectively. However, I wish to backpropagate Lh
only from layer h
, backward, and loss Lo
from layer o
backward.
Could anyone please point me through the right direction to do so?
Many thanks
Here is a small example:
class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.fc1 = nn.Linear(10, 10)
self.fc2 = nn.Linear(10, 10)
self.act = nn.ReLU()
def forward(self, x):
x1 = self.act(self.fc1(x))
x = self.fc2(x1)
return x, x1
# Create model and execute forward pass
criterion = nn.MSELoss()
model = MyModel()
x = torch.randn(1, 10)
o, h = model(x)
# Calculate losses
loss_o = criterion(o, torch.rand_like(o))
loss_h = criterion(h, torch.rand_like(h))
# Backward loss_h and keep intermediate activations
loss_h.backward(retain_graph=True)
# Check that self.fc2 grads are empty
for name, param in model.named_parameters():
print(name, param.grad)
# Backward loss_o
loss_o.backward()
# Gradient are accumulated in self.fc1 and newly populated in self.fc2
grads2 = []
for name, param in model.named_parameters():
print(name, param.grad)
grads2.append(param.grad.clone())
You would also get the same results if you sum both losses and call .backward()
on the result tensor.
1 Like