Implementing Truncated Backpropagation Through Time

I have a very similar problem where I am trying to unroll a recurrent neural network, in my case I don’t use truncated backprop but just BPTT. The network takes as input the previous output and my training code looks like:

for epoch in range(1, args.epochs + 1):
        model.train()
        epoch_loss = 0
        for i, sequence in enumerate(training_data_loader):
            optimizer.zero_grad()
            loss = 0

            output = torch.zeros(sequence['input'].size(0), 3, sequence['input'].size(3), sequence['input'].size(4)).cuda(args.gpu, non_blocking=True)

            for j in range(sequence['input'].size(2) - 1):
                inputs = torch.index_select(sequence['input'], 2, torch.tensor([j,j+1])).cuda(args.gpu, non_blocking=True)
                t = torch.squeeze(torch.index_select(sequence['target'], 2, torch.tensor([j+1])), 2).cuda(args.gpu, non_blocking=True)
                output, l = model(inputs, output, t, i, writer, im_out)
                loss += l 
            
            loss.backward()
            optimizer.step()

It looks like that the gradient is not flowing backwards, do you know what the issue could be?

1 Like