Questions about Gradient checkpointing

Martin_cb · August 6, 2020, 1:14am

Hey I have 2 questions with regards to using gradient checkpointing

1. Do Dropout and Batchnorm layers work now with Checkpointing?

in this tutorial, it is mentioned that these layers didn’t work with chekpointing:
https://github.com/prigoyal/pytorch_memonger/blob/master/tutorial/Checkpointing_for_PyTorch_models.ipynb

Is this still the case? Some of the pytorch documentation suggests that this may have changed for at least for dropout? Can someone provide a definitive answer?

2. What is the preferred way to use data parallel with Checkpoint_Sequential?
Can someone give me a template for both DataParallel and DistributedDataParallel, that actually works with multiple GPUs?