Hi everyone, I am learning LSTM. I try official LSTM example as follows:
for epoch in range(300): # again, normally you would NOT do 300 epochs, it is toy data for sentence, tags in training_data: # Step 1. Remember that Pytorch accumulates gradients. # We need to clear them out before each instance model.zero_grad() # Also, we need to clear out the hidden state of the LSTM, # detaching it from its history on the last instance. model.hidden = model.init_hidden() # Step 2. Get our inputs ready for the network, that is, turn them into # Variables of word indices. sentence_in = prepare_sequence(sentence, word_to_ix) targets = prepare_sequence(tags, tag_to_ix) # Step 3. Run our forward pass. tag_scores = model(sentence_in) # Step 4. Compute the loss, gradients, and update the parameters by # calling optimizer.step() loss = loss_function(tag_scores, targets) loss.backward() optimizer.step()
However, I have a question about the backpropagation:
loss = loss_function(tag_scores, targets) loss.backward() optimizer.step()
These three code lines seem to have nothing to do with sequence step, but I think LSTM needs to be trained with BPTT. Could you tell me the reason? And moreover, I also wonder when BPTT should be applied and how to realize BPTT with Pytorch? Thank you in advance.