CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle) when training "bert-base-uncased"

I’m running in to this error when training “bert-base-uncased”.

The metric I’m using is

metric = evaluate.combine([“accuracy”, “f1”, “precision”, “recall”])

and the training code is

fold_trainer = Trainer(

My dataset is 700 texts. The code works fine when running on Google Colab (but time consuming), so I turn to work on a server on terminal using virtual env and downloaded all needed packaged. However, I kept running into this problem. How can I solve this? Why would it run on Colab but not on the server?

I have tried CUDA_LAUNCH_BLOCKING=1, but it did not print out anything (even after 7 hours).

You might be running out of memory so try to reduce the batch size and rerun your code.

