Hey all

I’m currently in the process of looking at padding time-series for an LSTM implementation of mine and I’m trying to wrap my head around the use of packed sequences. I’m wondering why I shouldn’t just pad my sequences with float(‘nan’), and then search for nan in the output and use that to filter my loss function? This may be specific for my case, but I only care about the final output at the end of my time-series, so, for example in the below code, could I not just use the known lengths of my sequences to pull out the values just before the 'nan’s and use that to calculate my loss?

```
In [66]: x = [[3,5,6,4,float('nan')],
...: [3,5,4,float('nan'),float('nan')],
...: [4,5,6,7,5]]
In [70]: xT = torch.tensor(x).view(3,5,1)
In [73]: lstm = nn.LSTM(1,3,1,batch_first=True)
In [75]: o,_ = lstm(xT)
In [77]: lin = nn.Linear(3,1)
In [78]: lin(o)
Out[78]:
tensor([[[-0.2357],
[-0.2539],
[-0.2588],
[-0.2561],
[ nan]],
[[-0.2357],
[-0.2539],
[-0.2551],
[ nan],
[ nan]],
[[-0.2414],
[-0.2560],
[-0.2619],
[-0.2622],
[-0.2624]]])
```

My apologies if this is a stupid question, I’m still kind of new to pytorch and would like to understand this.

Thanks!