Checkpointing is slow on nn.DataParallel models

In cases of DataParallel (example 2 GPUs), is checkpointing slow?

I believe that the models must synchronize at every checkpoint during the backward pass. How to check and confirm this?