When I put output of Conv1D to LSTM, I randomly got 'RuntimeError: CUDA out of memory'. Why I got this error?

First of all, my model like below. And I am using Pytorch.
When this model put the output of Conv1d to LSTM, sometimes I got RuntimeError and sometimes I got nothing. I want to know why. Please Help me!

model => CNNModel(
  (embed): Embedding(10447, 256, padding_idx=1)
  (cnn): Conv1d(256, 64, kernel_size=(3,), stride=(1,))
  (lstm): LSTM(64, 64, num_layers=64, batch_first=True, bidirectional=True)
  (dropout): Dropout(p=0.5)
  (fc): Linear(in_features=128, out_features=1, bias=True)
  (sigmoid): Sigmoid()
  (loss_fn): BCELoss()
)
optimizer => Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    eps: 1e-08
    lr: 0.001
    weight_decay: 0
)
train:   0%|                                                                  | 0/11 [00:00<?, ?it/s]batch.word[0].size() => torch.Size([64, 257])
embed.size() => torch.Size([64, 256, 257])
conv.size() => torch.Size([64, 255, 64])
h.size() => torch.Size([64, 1])
loss => 0.7036488056182861
done
train:   9%|█████▎                                                    | 1/11 [00:10<01:43, 10.35s/it]batch.word[0].size() => torch.Size([64, 311])
embed.size() => torch.Size([64, 256, 311])
conv.size() => torch.Size([64, 309, 64])
h.size() => torch.Size([64, 1])
loss => 0.6949299573898315
done
train:  18%|██████████▌                                               | 2/11 [00:23<01:41, 11.29s/it]batch.word[0].size() => torch.Size([64, 939])
embed.size() => torch.Size([64, 256, 939])
conv.size() => torch.Size([64, 937, 64])

I think maybe this cause because of shape of Conv’s output and LSTM doen’t match.
But I don’t know why conv.size() => torch.Size([64, 309, 64]) go well, but conv.size() => torch.Size([64, 937, 64]) not.
How can I fix this error? What should I do??

I’m not sure how LSTM is implemented, but I think that be default it keeps some state information for each step. 937 is a rather long sequence. There seems to be some ways to circumvent this. Try googling for “pytorch lest sequence length memory detach” or something like that. It has been addressed in some places, but I can’t say that I have fully understood it. Maybe this helps you hunting down your error.

Thank you for replying!