Update model parameters using intermediate layer loss

I am trying to implement a specific functionality using a neural network. I have attached the block diagram of my system. Basically, I am trying to develop a recurrent cell-like functionality where the output of a certain layer (Output of FC2) is used as feedback input to a previous layer (CONV1).

The difference from an RNN cell is that the feedback should update the parameters of the CONV1 layer and the whole network’s parameters should be updated based on an intermediate layer loss.

A simplified version of the code is given below. The way I have implemented, FC2 layer parameters do not get updated during training.

class Net(nn.Module):
    def __init__(self, inputSize, num_classes):
        super(Net, self).__init__()

        # Filtering
        self.conv1 = custom_conv(cnn_N_filt[0], cnn_len_filt[0], fs)

        # Backend
        self.fc1 = nn.Linear(x, y) # Size based on previous layer output

        # Parameter calculation
        self.fc2 = nn.Linear(num_classes+inputSize, 1)

    def forward(self, x, hidden):
        x = F.relu(self.conv1(x, hidden))
        x = F.max_pool1d(x, 2, 2)
        x = x.view(-1, x.shape[1]*x.shape[2])
        x = F.relu(self.fc1(x))
        x = torch.cat((x, inputTemp), dim=1)
        hidden  = F.relu(self.fc2(x))
        hidden = hidden.view(-1)

        return F.log_softmax(x, dim=1), F.sigmoid(hidden)

So, I believe the hidden output from FC2 should be given as a model parameter to CONV1 during each forward pass.
Any ideas on how to achieve this?

Hi, in case you still have this problem, I think this could work, it may not be the most optimal solution, because you almost make 2 forward passes and manually toggle the requires grad attribute, which I do not know the computational cost so beware of that.

import torch
import torch.nn as nn

# Example layers
fc1 = nn.Linear(1, 1)
fc2 = nn.Linear(1, 1)
fc3 = nn.Linear(1, 1)
fc4 = nn.Linear(1, 1)
wrapper = nn.Sequential(fc1, fc2, fc3, fc4) # To make zero_grad

# To make the forward of a module, without storing the grad
# for itself, but also without cutting the computation graph.
def pass_no_grad(input, module):
    prev_status = {}
    for name, param in module.named_parameters():
        prev_status[name] = param.requires_grad
        param.requires_grad = False
    out = module(input)
    for name, param in module.named_parameters():
        param.requires_grad = prev_status[name]
    return out

t = torch.randn(1, 1)

x1 = t.clone()

# First path
x1 = fc1(x1)
x2 = x1.detach().clone() # Cut the gradient of the other path
x1 = pass_no_grad(x1, fc2)
x1 = pass_no_grad(x1, fc3)
x1 = fc4(x1)
print('First path grads')
print('1', fc1.weight.grad)
print('2', fc2.weight.grad)
print('3', fc3.weight.grad)
print('4', fc4.weight.grad)


# Second path
x2 = fc2(x2)
x2 = fc3(x2)

print('Second path grads')
print('1', fc1.weight.grad)
print('2', fc2.weight.grad)
print('3', fc3.weight.grad)
print('4', fc4.weight.grad)

First path grads
1 tensor([[-0.1179]])
2 None
3 None
4 tensor([[-0.8320]])
Second path grads
1 tensor([[0.]])
2 tensor([[-0.8599]])
3 tensor([[0.0585]])
4 tensor([[0.]])


As you can see, on one path only the fc1 and fc4 layers have a gradient. In the other path only fc2 and fc3 have non-zero grad, but since the other modules still have requires_grad=True, their gradient is not None, but because it is zero it won’t modify the weight’s values on the optimizer step.
Please let me know if this is what you needed.