Propagating loss on multiple output layers

shavitamit · October 30, 2017, 3:00am

I’m following the tutorial here: http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html

In the image an input (and hidden layer) are combined and go through the network where both the current state (hidden layer) and output layer are generated.

My question is: what if I want to add another route to the network (i.e., predict some binary label as well as continue to do the RNN that’s in the image). The network would essentially predict two things, (1) a binary label, and (2) the next word in the sequence.

Perhaps a simplified version of my question in code would be simpler to understand. Given the following network architecture:

class Model(nn.Module):

    def __init__(self, inputs):
        
        super(Model, self).__init__()
        
        self.linear1 = nn.Linear(inputs,100)
        
        self.l2o1 = nn.Linear(100,2)
        self.l2o2 = nn.Linear(100,2)
        
    def forward(self, inputs):
        
        o = self.linear1(inputs)
        
        o1 = F.log softmax(self.l2o1(o))
        o2 = F.log_softmax(self.l2o2(o))
        
        return o1, o2

How can I properly propagate the error on this network that generates two binary predictions? Is this possible? The reasoning for doing this is to use both classes as a way to enforce stricter loss on the initial Linear layer (inputs -> 100 dimensional array). Plus it seems to me it’ll be more efficient to train one network that predicts two things than two networks that each predict one thing.

Furthermore, given that the RNN in the tutorial shown above returns output, hidden, I assume I would need to modify it to return output, hidden, binary_output to return the binary prediction from the secondary network path. Am I correct in assuming that? And, how should I go about calculating the loss given that I currently call the loss function with loss(output,target)? I would need to somehow incorporate the loss from the binary_output as well.

Thanks

SimonW · October 30, 2017, 6:45pm

You can use the output variable to generate two outputs. There are multiple ways, e.g., something like the forward in your code above.

To backprop, just do loss on different outputs, add them together, and call backward on the sum.

shavitamit · October 31, 2017, 3:11am

Calculating the loss for each individual output and summing it together worked, thanks!

FaribaKh92 · July 9, 2018, 6:17pm

I have the same question. Could you please give more details on what you did? Thanks!

shavitamit · July 13, 2018, 3:35pm

I followed @SimonW’s advice and simply calculated the loss on both individual outputs. My forward routine returned o1 and o2. I then calculated loss1 = loss_function(o1, truth1) and loss2 = loss_function(o2,truth2) and summed them up. loss = loss1+loss2. You can then run the backward prop on the summed-up loss.

karl7 · May 14, 2021, 6:54am

Hi @SimonW , may i ask a question?

If the outputs in my network come from some different layers (such as output1 comes directly from the backbone, output2 comes from the fc classification layer–just the layer after the backbone), can i also add their losses together and then backward once?

Or i need to backward them separately? because the losses constrain the outputs of different layers in the network…

Look forward to your reply, thank you!

SimonW · May 19, 2021, 3:07am

you can sum them together and backprop once