What the DDP wrapper do before pass args into self.module?

in the newest version of pytorch, i notice that the DDP wrapper recursively convert all tensors to cuda tensor when use multigpu training. this behavior is mostly expected.
but recently, i need some moudels use eval mode in my model even in training (mainly for batchnorm), i just code:

# model has been wrapped by DDP
model.training = True

since there’s only batchnorm that behaves different between training and testing, so the code above should work properly.

i manually set the flag model.training to True because some of my forward code depends on it. but every time step into training forward, this flag automatically become False.
since the ddp source code is complex, i just wonder what the DDP do before passing args (also kwargs) to model? is there any flags or behavior not mentioned in docs?


Hey @siesta, I don’t recall DDP implicitly set .trianing to False
in forward. The code below is what happens before calling forward on the original model.

Is there a repro that we can dig into?

thanks for reply! i’m checking and rearrange my code. i’ll reply this thread when repo published and also check this problem can be reoccurred~