I wanted to ask, does utilizing DistributedDataParallel impact validation/testing? I’m wondering if it is still possible to “pause” after each epoch and test the model.
Does DistributedDataParallel prevent testing the model during the training loop (after an epoch)?
Since the data and model are trained on different devices, I was unsure if there was an issue with testing the model.
Also, if you can test, should it be done on only a single node, or how should the testing loop be incorporated? For all training agents?
For reference the docs are shown here.