I have sequences which I padded to a fixed length (365 days) by inserting zeros at the missing time steps (so the padding is contained at varying time steps within the sequences). I then feed the sequences into an LSTM network in order to classify them.

I created a mask which contains `True`

if the value is 0 (padding) and `False`

if not, s.t. the model does not take into account the zeros (I checked the masks and there are indeed values that are `False`

, meaning no padding and the timestep should be taken into account by the model).

However, for some of my sequences, the output of the backbone (before applying the linear layer for classification) results in **only** `nan`

values.

Does someone know why? Am I doing anything wrong?

Is it correct to apply the masking before feeding the values through the lstm layer? I also tried to apply the masking afterwards. In that case, `out`

contains float values but once the masking is applied, all values are `-inf`

.

```
def forward(self, x, device):
mask = x[:, :, 0].eq(0).unsqueeze(-1)
mask = mask.to(device) # [batch_size, seq_len, 1] = [16, 365, 1]
# masking out padded time steps
x = x.masked_fill(mask.bool(), -np.inf) # now only some values are -inf as expected
x = x.float() # [batch_size, seq_len, channels] = [16, 365, 3]
# cell states
h0 = (
torch.zeros(layer_dim, x.size(0), hidden_dim)
.requires_grad_()
.to(device)
)
# Initialize cell state
c0 = (
torch.zeros(layer_dim, x.size(0), hidden_dim)
.requires_grad_()
.to(device)
)
# [batch_size, seq_len, hidden_dim] = [16, 365, 150]
out, _ = self._lstm_layer(x, (h0.detach(), c0.detach())) # now all values are nan
```