Is it necessary to use packed sequences for variable length RNN inputs, utilizing mini-batch training

Hey all

I’m currently in the process of looking at padding time-series for an LSTM implementation of mine and I’m trying to wrap my head around the use of packed sequences. I’m wondering why I shouldn’t just pad my sequences with float(‘nan’), and then search for nan in the output and use that to filter my loss function? This may be specific for my case, but I only care about the final output at the end of my time-series, so, for example in the below code, could I not just use the known lengths of my sequences to pull out the values just before the 'nan’s and use that to calculate my loss?

In [66]: x = [[3,5,6,4,float('nan')],
    ...:     [3,5,4,float('nan'),float('nan')],
    ...:     [4,5,6,7,5]]

In [70]: xT = torch.tensor(x).view(3,5,1)

In [73]: lstm = nn.LSTM(1,3,1,batch_first=True)

In [75]: o,_ = lstm(xT)

In [77]: lin = nn.Linear(3,1)

In [78]: lin(o)
Out[78]: 
tensor([[[-0.2357],
         [-0.2539],
         [-0.2588],
         [-0.2561],
         [    nan]],

        [[-0.2357],
         [-0.2539],
         [-0.2551],
         [    nan],
         [    nan]],

        [[-0.2414],
         [-0.2560],
         [-0.2619],
         [-0.2622],
         [-0.2624]]])

My apologies if this is a stupid question, I’m still kind of new to pytorch and would like to understand this.

Thanks!

Hey @Clint , did you find a solution ? I’m stuck on the same problem , I’m using a time series and cannot pad with 0.