Finding the cause of RuntimeError: Expected to mark a variable ready only once

i tried all solutions on setting static graph , unused parameters but no luck,
single gpu is running fine but getting error in multi-gpu training.

Hello, I am having the same problem with cagrad, how did you solve it?

does this work with MMDistributedDataParallel?

I don’t know what MMDistributedDataParallel is and what the difference to DistributedDataParallel would be.

Does DDP support multiple losses? It seems like calling loss1.backward(retain_graph=True then loss2.backward() wouldn’t work because of checkpointing. I’m getting the same Runtime Error.

I encounter the same issue using Accelerate library from huggingface. Could someone illustrates the root cause of this issue please?

Why would an unused parameter cause the problem only with DDP?

DDP reduces the gradients of all ranks during the backward pass and is thus expecting to use valid .grad attributes of all properly registered parameters.