Say I want to add a loss term (i.e. gradients from this loss should be propagated) from the output of hidden layers itself i.e. we use the output of a hidden layer and say pass it to a square function (hidden_output/activation)^2. How can I implement this in PyTorch?

Thanks in advance!

I’m not sure if I understand the use case correctly, but you could use any output of a layer and add it to the loss before calling `backward`

.

Hi! I’m a new user of PyTorch and I have only used the predefined functions namely, I have first defined outputs from a Neural Net compared it with labels and applied it to criterion nn.CrossEntropyLoss as given in the example for training a simple classifier. So how do we add the outputs from a layer itself?

Here is a small dummy example to use the some intermediate activation in your loss:

```
class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.fc1 = nn.Linear(10, 64)
self.fc2 = nn.Linear(64, 10)
def forward(self, x):
x1 = F.relu(self.fc1(x))
x = self.fc2(x1)
return x, x1
x = torch.randn(10, 10)
y = torch.randn(10, 10)
model = MyModel()
optimizer = optim.SGD(model.parameters(), lr=1e-0)
criterion = nn.MSELoss()
for epoch in range(100):
optimizer.zero_grad()
output, aux = model(x)
loss = criterion(output, y)
loss = loss + (aux**2).mean()
loss.backward()
optimizer.step()
print('Epoch {}, loss {}, aux norm {}'.format(
epoch, loss.item(), aux.norm()))
```

Would this work as a starter for your use case or are you dealing with another problem?

Thank You! Very cool trick, I was not aware of it. I think this will do If not I’ll refer you again.

I believe the aux loss should not impact the weights of fc2 and should only impact fc1. However when I try to compare the weights of 2 networks one with aux and one without aux, the weights are different. I don’t fully understand why did that happen ?

class MyModel(nn.Module):

def **init**(self):

super(MyModel, self).**init**()

self.fc1 = nn.Linear(10, 64)

self.fc2 = nn.Linear(64, 10)

```
def forward(self, x):
x1 = F.relu(self.fc1(x))
x = self.fc2(x1)
return x, x1
```

x = torch.randn(10, 10)

y = torch.randn(10, 10)

model = MyModel()

model2 = MyModel()

model2.load_state_dict(model.state_dict())

optimizer = torch.optim.SGD(model.parameters(), lr=1e-0)

optimizer2 = torch.optim.SGD(model.parameters(), lr=1e-0)

criterion = nn.MSELoss()

for epoch in range(100):

optimizer.zero_grad()

output, aux = model(x)

loss = criterion(output, y)

loss = loss + (aux**2).mean()

loss.backward()

optimizer.step()

```
optimizer2.zero_grad()
output2, _ = model2(x)
loss2 = criterion(output2, y)
loss2.backward()
optimizer2.step()
print((model2.fc2.weight.detach() == model.fc2.weight.detach()).all())
```

In your code you are using `model.parameters()`

for both optimizers, so that `model2`

won’t get any updates.

After fixing this, you would expect to see the same parameters after loading the `state_dict`

and after the first weight updates.

Since `model.fc1`

was updated in another way than `model2.fc1`

, you cannot expect the subsequent iterations of the used 100 to yield the same result for `.fc2.weight`

, since the output and thus loss would be different (otherwise there would be no need to use the `aux`

output/loss).

Also, you can post code snippets by wrapping them into three backticks ```, which makes debugging easier.