I am trying to use loss terms with the output of intermediary layers, but I get an error that “it cannot compute the gradients with respect to labels”. To be more explicit:

Say you have an architecture like

```
layer1 = self.layer1(input)
layer1 = F.relu(layer1)
layer2 = self.layer2(input)
layer2 = F.relu(layer2)
layer3 = self.layer3(input)
layer3 = F.relu(layer3)
```

And I would want to use a loss term like

```
criterion = nn.MSELoss()
loss_term = criterion(layer2, layer1)
```

And I get an error mentioned above: “cannot compute gradients with respect to labels. Either mention requires_gradients = False or set the variable as volatile”. (The error message is approximate since I don’t have pytorch and my code here to quickly reproduce it. I need to implement something like the above snippet, can anyone please help?