Mp.spawn() and wandb

Dalisson_Figueiredo · September 15, 2020, 12:13am

I was wondering if anybody run into a problem while trying to train with mp.spawn() and DistributedDataParallel using two gpus (one process per gpu) where wandb gets stuck and wont allow the training to go on.

ayalaa2 · September 15, 2020, 12:44am

Are you using wandb on both processes or just one? I would suggest having one process handle logging and collecting info from the other processes

Dalisson_Figueiredo · September 15, 2020, 12:47am

I am using init in both processes, I tried to run in a single process but it also gets stuck.