How could I merge list or synchronize amound processes

coincheung · December 1, 2020, 3:12am

Hi,

I am using distributed mode to train my model, the launch command is like this:

python -m torch.distributed.launch --nproc_per_node=4 train.py

In theory, this would start 4 processes.
In my program, each processes would generate a list of strings:

process 1: a = ['a', 'b', 'c']
process 2: a = ['1', '2', '3']
...

I need to merge the lists in to one whole list and share them among the processes, so that after this operation, each process has a list whose content is:

a = ['a', 'b', 'c', '1', '2', '3']

How could I do this please ?

By the way, I noticed that there is a function named torch.cuda.synchronize(), will this function ensure that all the processes are synchronized at this line, or is this only ensure the synchronization of the backend operation without considering the python frontend operations ?