I have a very similar problem where I am trying to unroll a recurrent neural network, in my case I don’t use truncated backprop but just BPTT. The network takes as input the previous output and my training code looks like:
for epoch in range(1, args.epochs + 1):
model.train()
epoch_loss = 0
for i, sequence in enumerate(training_data_loader):
optimizer.zero_grad()
loss = 0
output = torch.zeros(sequence['input'].size(0), 3, sequence['input'].size(3), sequence['input'].size(4)).cuda(args.gpu, non_blocking=True)
for j in range(sequence['input'].size(2) - 1):
inputs = torch.index_select(sequence['input'], 2, torch.tensor([j,j+1])).cuda(args.gpu, non_blocking=True)
t = torch.squeeze(torch.index_select(sequence['target'], 2, torch.tensor([j+1])), 2).cuda(args.gpu, non_blocking=True)
output, l = model(inputs, output, t, i, writer, im_out)
loss += l
loss.backward()
optimizer.step()
It looks like that the gradient is not flowing backwards, do you know what the issue could be?