I’m trying to convert a BERT-LSTM model to XLM-R - LSTM model. The complete code of BERT-LSTM worked fine without any bugs. The `forward`

function of the `BERT-LSTM`

is as follows.

```
def forward(self, sents):
sents_tensor, masks_tensor, sents_lengths = sents_to_tensor(self.tokenizer, sents, self.device)
encoded_layers, pooled_output = self.bert(input_ids=sents_tensor, attention_mask=masks_tensor, output_all_encoded_layers=False)
encoded_layers = encoded_layers.permute(1, 0, 2)
enc_hiddens, (last_hidden, last_cell) = self.lstm(pack_padded_sequence(encoded_layers, sents_lengths))
output_hidden = torch.cat((last_hidden[0], last_hidden[1]), dim=1)
output_hidden = self.dropout(output_hidden)
pre_softmax = self.hidden_to_softmax(output_hidden)
return pre_softmax
```

When I tried to use the same forward function to train the XLM-R - LSTM model, I got the following error

`TypeError: forward() got an unexpected keyword argument 'output_all_encoded_layers'`

So, I removed `output_all_encoded_layers=False`

from

`encoded_layers, pooled_output = self.bert(input_ids=sents_tensor, attention_mask=masks_tensor, output_all_encoded_layers=False)`

.

This is the new forward function.

```
def forward(self, sents):
sents_tensor, masks_tensor, sents_lengths = sents_to_tensor(self.tokenizer, sents, self.device)
encoded_layers = self.bert(input_ids=sents_tensor, attention_mask=masks_tensor)
encoded_layers = encoded_layers.permute(1, 0, 2)
enc_hiddens, (last_hidden, last_cell) = self.lstm(pack_padded_sequence(encoded_layers, sents_lengths))
output_hidden = torch.cat((last_hidden[0], last_hidden[1]), dim=1)
output_hidden = self.dropout(output_hidden)
pre_softmax = self.hidden_to_softmax(output_hidden)
return pre_softmax
```

Now I get following error

```
AttributeError: 'tuple' object has no attribute 'permute'
```

How can I solve this?