Hi,
I’m trying to create a multioutput / multihead feedforward neural network with some shared layers and two different heads.
Based on a condition, the forward step should split the features such that one group of samples should go into the one head and the other group of samples should go into the other head. I’m using indices to distinguish between these two groups.
Example code (I reduced the code to a minimum. Let me know if you need more information):
def forward(self, x):
indices_t = (x[:, -1] == 1).nonzero()
indices_c = (x[:, -1] == 0).nonzero()
x = x[:, :-1]
pred_t = None
pred_c = None
# Shared layers
z1 = self.fc1(x)
a1 = F.elu(z1)
z2 = self.fc2(a1)
a2 = F.elu(z2)
...
x_t = torch.index_select(a6, 0, indices_t.flatten())
x_c = torch.index_select(a6, 0, indices_c.flatten())
# Head One
if x_t.shape[0] > 0:
z1_t = self.fc_t_1(x_t)
a1_t = F.tanh(z1_t)
...
pred_t = F.softmax(a4_t, dim=1)
# Head Two
if x_c.shape[0] > 0:
z1_c = self.fc_c_1(x_c)
a1_c = F.tanh(z1_c)
z2_c = self.fc_c_2(a1_c)
a2_c = F.tanh(z2_c)
...
pred_c = F.softmax(a4_c, dim=1)
return pred_t, pred_c
So far, the forward step works. The problem is the backward step / updating the weights.
Here, I read that I can simply add my two losses and compute the gradients. Unfortunately, weights of both heads are updated, even if I process only samples from one group. For example, let’s assume we have two different groups: X and Y. Further, let’s assume we process a single sample which belongs to group X. The idea is to use the loss, calculated in the forward step, to update the shared layers + the head belonging to group X but not the head belonging to group Y. Unfortunately, in my case, both layers are updated.
Here is an excerpt of the backward step:
for x, y in train_dl:
pred_t, pred_c = model(x)
indices_t = (x[:, -1] == 1).nonzero()
indices_c = (x[:, -1] == 0).nonzero()
y_t = torch.index_select(y, 0, indices_t.flatten())
y_c = torch.index_select(y, 0, indices_c.flatten())
if pred_t is not None:
loss1 = F.cross_entropy(pred_t, y_t)
if pred_c is not None:
loss2 = F.cross_entropy(pred_c, y_c)
# Case 1: Both losses could be calculated
loss = loss1 + loss2
else:
# Case 2: Only one loss could be calculated
loss = loss1
else:
# Case 3: Only one loss could be calculated
loss = F.cross_entropy(pred_c, y_c)
loss.backward()
optimizer.step()
optimizer.zero_grad()
I also tried using requires_grad
in the for loop on the layers which should not be updated. Unfortunately, the weights in all layers are nevertheless constantly updated
Thanks in advance.