How to do inference with DistributedDataParallel?

asivap · February 26, 2021, 12:44am

I can successfully train a DDP model for an epoch across several processes. I want to evaluate on a cross validation set after the epoch. If I do this in process 0, it hangs presumably because it’s waiting for synchronization which never comes.

How can I get a normal model from the DDP model? Currently, I’m using a workaround where I save the DDP model state and load it into a temp model. This temp model then evaluates on the validation set.