I’m working on a sequence tagging algorithm, my input are sentence / labels pairs:
Pytorch is so much better than tensorflow . <- sentence 1 0 0 0 0 0 1 0 <- labels
I’m using a classic bi-LSTM with softmax to get one prediction per timestep. At training time, I’m using minibatches of size
batch_size * max_sent_length * input_emb_size, where
max_sent_length is the length of the longest sentence in the batch, I zero pad the others. I use
nn.utils.rnn.pack_padded_sequence to compute just what is needed. Once I forward my input batch into my net, I need to compute the loss and backpropagate. I’m not sure how to properly compute the loss here:
- The sentences being zero-padded, the only solution I see would be to iterate over each entry in the batch so I can ignore the zero-padded timesteps. Is there a better way ?
- If I call multiple times the criterion function to compute the loss, do I have to call the
backwardfunction each time? Or is there a smart way to accumulate the losses and call
backwardonly once ?