After LSTM, too much zero

Gorgen · May 8, 2023, 7:58am

hi all:

when use LSTM layer to extract high-level deep learning features based on pre-trained Roberta features as follows:“self.lstm = nn.LSTM(1024, 256, 2, batch_first= False, bidirectional=True)”, a lot of zero producing. why? how to solve ?

'text tensor([[[0.0000, 0.0000, 0.2246, …, 0.0000, 0.0966, 0.1180],
[0.0000, 0.0000, 0.0475, …, 0.0442, 0.0149, 0.0588]],

    [[0.0000, 0.0000, 0.3146,  ..., 0.0000, 0.0822, 0.2287],
     [0.0000, 0.0000, 0.1793,  ..., 0.0000, 0.0000, 0.2323]],

    [[0.0000, 0.0000, 0.2645,  ..., 0.0000, 0.0585, 0.2474],
     [0.0000, 0.0000, 0.2497,  ..., 0.0000, 0.0686, 0.2068]],

    ...,

    [[0.0000, 0.0000, 0.0288,  ..., 0.0000, 0.0358, 0.0000],
     [0.0000, 0.0000, 0.0535,  ..., 0.0000, 0.0000, 0.0470]],

    [[0.0000, 0.0000, 0.0260,  ..., 0.0000, 0.0000, 0.0591],
     [0.0000, 0.0000, 0.0458,  ..., 0.0000, 0.0000, 0.0327]],

    [[0.1995, 0.0000, 0.3716,  ..., 0.0000, 0.0908, 0.3951],
     [0.0000, 0.0028, 0.0427,  ..., 0.0000, 0.0000, 0.0246]]],
   device='cuda:0', grad_fn=<ReluBackward0>) torch. Size([11, 2, 64])'

J_Johnson · May 8, 2023, 10:45am

Nothing to do with LSTM. You’re using a ReLU function. It’s job is to turn negative values into zeroes.

https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html