In cases of DataParallel (example 2 GPUs), is checkpointing slow?
I believe that the models must synchronize at every checkpoint during the backward pass. How to check and confirm this?
In cases of DataParallel (example 2 GPUs), is checkpointing slow?
I believe that the models must synchronize at every checkpoint during the backward pass. How to check and confirm this?