hi all:
when use LSTM layer to extract high-level deep learning features based on pre-trained Roberta features as follows:“self.lstm = nn.LSTM(1024, 256, 2, batch_first= False, bidirectional=True)”, a lot of zero producing. why? how to solve ?
'text tensor([[[0.0000, 0.0000, 0.2246, …, 0.0000, 0.0966, 0.1180],
[0.0000, 0.0000, 0.0475, …, 0.0442, 0.0149, 0.0588]],
[[0.0000, 0.0000, 0.3146, ..., 0.0000, 0.0822, 0.2287],
[0.0000, 0.0000, 0.1793, ..., 0.0000, 0.0000, 0.2323]],
[[0.0000, 0.0000, 0.2645, ..., 0.0000, 0.0585, 0.2474],
[0.0000, 0.0000, 0.2497, ..., 0.0000, 0.0686, 0.2068]],
...,
[[0.0000, 0.0000, 0.0288, ..., 0.0000, 0.0358, 0.0000],
[0.0000, 0.0000, 0.0535, ..., 0.0000, 0.0000, 0.0470]],
[[0.0000, 0.0000, 0.0260, ..., 0.0000, 0.0000, 0.0591],
[0.0000, 0.0000, 0.0458, ..., 0.0000, 0.0000, 0.0327]],
[[0.1995, 0.0000, 0.3716, ..., 0.0000, 0.0908, 0.3951],
[0.0000, 0.0028, 0.0427, ..., 0.0000, 0.0000, 0.0246]]],
device='cuda:0', grad_fn=<ReluBackward0>) torch. Size([11, 2, 64])'