How multi-threading is managed within ddp?

you are so right!!!
distributed turned off the multithreading and the c++ function was too late when it asks for the max nbr threads which will be 1.

– begin side question
side question: how to ask ddp to log into a specific file?
the dd logging is in terminal which makes catching warnings difficult.
there are a lot of messages but they are lost.
is there a way to ask ddp to write logs in a specific file?
or can we properly manipulate the instance of log here to do that? thanks
the first time i used ddp, it starts throwing logs in terminal which is unpractical for debug.
there should be a way to tell ddp to log into a file.
the doc1 and doc2 do not seem to cover this.
i didnt investigate further as other things have more priority.
thanks

–end side question

so, now i configure export OMP_NUM_THREADS=32 before running and the runtime is back to 70ms with ddp+2gpus!!! this is cool!
thank you very much! this is a life saver!

also, i was getting this warning which is impossible to read because it is printed on terminal!

*****************************************
Setting OMP_NUM_THREADS environment variable for each process to 
be 1 in default, to avoid your system being overloaded, please further tune 
the variable for optimal performance in your application as needed.
*****************************************

which explains everything!

also, i am using distributed.launch which is explains why i am getting this warning

The module torch.distributed.launch is deprecated and going to be removed
 in future.Migrate to torch.distributed.run

in the examples they provided, they use launch. i should probably switch to run, it seems more up to date.
also, they used launch in this under Launch utility. probably the doc needs to be updated probably in the next release. i am using pytorch 1.9.0.

again, thank you so much! this was very helpful!