i tried all solutions on setting static graph , unused parameters but no luck,
single gpu is running fine but getting error in multi-gpu training.
Hello, I am having the same problem with cagrad, how did you solve it?
does this work with MMDistributedDataParallel?
I don’t know what MMDistributedDataParallel
is and what the difference to DistributedDataParallel
would be.
Does DDP support multiple losses? It seems like calling loss1.backward(retain_graph=True
then loss2.backward()
wouldn’t work because of checkpointing. I’m getting the same Runtime Error.
I encounter the same issue using Accelerate library from huggingface. Could someone illustrates the root cause of this issue please?
Why would an unused parameter cause the problem only with DDP?
DDP
reduces the gradients of all ranks during the backward
pass and is thus expecting to use valid .grad
attributes of all properly registered parameters.