FairSeq is restoring the training and validation, if they run into OOM issues.
Have a look at this code.
Depending where the OOM error occurred, they just skip the batch or clear the cache.
FairSeq is restoring the training and validation, if they run into OOM issues.
Have a look at this code.
Depending where the OOM error occurred, they just skip the batch or clear the cache.