DDP and Gradient checkpointing

Thu_Nguyen · September 17, 2021, 2:09pm

Hi everyone,
I tried to use torch.utils.checkpoint along with DDP. However, after the first iteration, the program hanged. I read one thread last year in the forum and a person said that DDP and checkpointing havent worked together yet. Is that true? Any suggestions for my case? Thank you.

albanD · September 17, 2021, 3:21pm

Hi,

I am afraid this is true.
We are working on a solution for in 1.10.

rvarm1 · September 20, 2021, 4:52pm

We currently have a prototype API _set_static_graph which can be applied to DDP if your training is static across all iterations (i.e. there is no conditional execution in the model). Documentation: pytorch/distributed.py at master · pytorch/pytorch · GitHub.

With static graph training, DDP will record the # of times parameters expect to get gradient and memorize this, which solves the issue around activation checkpointing and should make it work.

Brando_Miranda · December 16, 2021, 11:14pm

I don’t understand what the issue is. Why did your code hang - that is essential information to put in here. Did you try any of the following:

if none of them worked can you provide more details? In particular Your original post does not describe enough to know what the problem is. Things can hang for many reasons - especially in complicated multip processing code.

Bhavay_Malhotra · August 8, 2023, 5:52pm

Hi @albanD, did you find a solution to this in 1.10.0.