I can successfully train a DDP model for an epoch across several processes. I want to evaluate on a cross validation set after the epoch. If I do this in process 0, it hangs presumably because it’s waiting for synchronization which never comes.
How can I get a normal model from the DDP model? Currently, I’m using a workaround where I save the DDP model state and load it into a temp model. This temp model then evaluates on the validation set.