Thank you @basingse ! It solved the problem, but now has a new error:
encoded_layers = encoded_layers.permute(1, 0, 2)
RuntimeError: number of dims don't match in permute
The main problem I have is why does is code throw this many errors for XLM-R model as there were no errors for BERT model