Use gradient of hidden layer wrt network inputs in the calculation of the loss function


I’d like to use the gradient of the output of a hidden layer wrt the network input as one of the terms in the calculation of the network loss. Any suggestion of how this could be done?

You could take the grad of hidden_out wrt the input with grad_input = torch.autograd.grad(hidden_out, (input,)), and that use that however you’d like in the final loss.

And where may I find such hidden_out ?

That further depends on your network and use case.
One way to do that -

class LinearModel(nn.Module):
  def __init__(self, input_size, output_size, hidden_size):
     super(LinearModel, self).__init__()
     self.fc1 = nn.Linear(input_size, hidden_size)
     self.hidden = nn.Linear(hidden_size, hidden_size)
     self.fc2 = nn.Linear(hidden_size, output_size)

  # def forward(self, x):
    # body

lm = LinearModel(input_size=8, output_size=2, hidden_size=4)
inp_x = torch.randn(1, 8, requires_grad=True)
hidden_out = lm.hidden(F.relu(lm.fc1(inp_x)))

grad_input = torch.autograd.grad(hidden_out.sum(), (inp_x)) # (tensor([[-0.0315,  0.0286,  0.0313,  0.0286, -0.0108, -0.0141,  0.0190, -0.0012]]),)

Aha, got the point. Thanks for the help!