I am using an LSTM for a sequential classification task. I am attempting to backpropagate through time (non-truncated) and update the parameters for each file in my training set. However, I am a bit confused on how to implement BPTT.
Here is a snippet of my code.
for file in file_list:
# Clear existing gradients
optimizer.zero_grad()
# Load features and labels for current file
x_batch, y_batch = read_feat_file(file, conf_dict)
# Normalize features
x_batch = scaler.transform(x_batch)
# Encode labels as integers
y_batch = le.transform(y_batch).astype('long')
# Move to GPU
x_batch = (torch.from_numpy(x_batch)).to(device)
y_batch = (torch.from_numpy(y_batch)).to(device)
# Get outputs
train_outputs = model(x_batch)
# Calculate loss
loss = F.nll_loss(train_outputs, y_batch, reduction='sum')
losses.append(loss.to('cpu').detach().numpy())
# Backpropagate and update weights
loss.backward()
optimizer.step()
However, based on others posts such as this one, it looks like the correct way to perform non-truncated BPTT is to pass the output at the t-1 time step as the input to the model at time t and then call .backward() at the end of the sequence.
My question: As written, is my code actually performing backpropagation through time?