How the multiple GPUs interconnect when using the DataParallel to train model on multi-GPUs

zzsunshine · February 11, 2020, 4:49am

Hi all,

I try to train a model by using multi-GPUs on single machine. And I want to figure out how the multiple GPUs connect with each other, (I mean the shape of the connection, full connect? in parallel? or in series?

Thanks!

ptrblck · February 11, 2020, 5:21am

If you want to check how the devices are connected inside your machine, you could run nvidia-smi topo -m in your terminal.

I might have misunderstood the question, but nn.DataParallel replicated the model on each device as explained here.

zzsunshine · February 11, 2020, 6:17am

@ptrblck thanks for your reply! I used the DataParallel module, and after I checked the code, I found that the GPU0 will gather the results from other GPUs like GPU1,2 and then it processes some computations by itself and then scatter to others. So I want to check the network between them (the way of communication). And how the GPU communicate with the CPU in this DataParallel approach. I tried to dive into the C++ code, however, I am not clear where could I search for this.

Thanks

ptrblck · February 11, 2020, 6:20am

You can find the implementation in data_parallel.py.

zzsunshine · April 9, 2020, 8:01pm

It helps. Thanks for your help