What's the best way to handle exception "cuda runtime error (2) : out of memory"?

You could try the approach from FairSeq in this thread.