The size of tensor a (2) must match the size of tensor b (39) at non-singleton dimension 1

Hello,

I’m trying to train a CamemBERT Base Cased model in Google Colab. I’m using fast-bert library.

Sometimes the code runs fine first time without error.

Other times, the same code, using the same data, results in a “CUDA out of memory” error.

Previously, restarting the runtime or exiting the notebook, going back into the notebook, doing a factory runtime restart, and re-running the code runs successfully without error. Just now though, I’ve tried a restart and re-try many times and got the error every time, even when reducing the batch size to 1.

Does anyone know why this is happening, why it is intermittent, and/or what I can do about it?

When it does work, when reducing the batch size per gpu from 16 to 8. I got this error:

RuntimeError: The size of tensor a (2) must match the size of tensor b (39) at non-singleton dimension 1

I know it’s due to the batch size changing, but have no idea how to fix it.