I’m following the tutorial here: http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html
In the image an input (and hidden layer) are combined and go through the network where both the current state (hidden layer) and output layer are generated.
My question is: what if I want to add another route to the network (i.e., predict some binary label as well as continue to do the RNN that’s in the image). The network would essentially predict two things, (1) a binary label, and (2) the next word in the sequence.
Perhaps a simplified version of my question in code would be simpler to understand. Given the following network architecture:
class Model(nn.Module): def __init__(self, inputs): super(Model, self).__init__() self.linear1 = nn.Linear(inputs,100) self.l2o1 = nn.Linear(100,2) self.l2o2 = nn.Linear(100,2) def forward(self, inputs): o = self.linear1(inputs) o1 = F.log softmax(self.l2o1(o)) o2 = F.log_softmax(self.l2o2(o)) return o1, o2
How can I properly propagate the error on this network that generates two binary predictions? Is this possible? The reasoning for doing this is to use both classes as a way to enforce stricter loss on the initial Linear layer (inputs -> 100 dimensional array). Plus it seems to me it’ll be more efficient to train one network that predicts two things than two networks that each predict one thing.
Furthermore, given that the RNN in the tutorial shown above returns
output, hidden, I assume I would need to modify it to return
output, hidden, binary_output to return the binary prediction from the secondary network path. Am I correct in assuming that? And, how should I go about calculating the loss given that I currently call the loss function with
loss(output,target)? I would need to somehow incorporate the loss from the
binary_output as well.