Hi,
I am using distributed mode to train my model, the launch command is like this:
python -m torch.distributed.launch --nproc_per_node=4 train.py
In theory, this would start 4 processes.
In my program, each processes would generate a list of strings:
process 1: a = ['a', 'b', 'c']
process 2: a = ['1', '2', '3']
...
I need to merge the lists in to one whole list and share them among the processes, so that after this operation, each process has a list whose content is:
a = ['a', 'b', 'c', '1', '2', '3']
How could I do this please ?
By the way, I noticed that there is a function named torch.cuda.synchronize()
, will this function ensure that all the processes are synchronized at this line, or is this only ensure the synchronization of the backend operation without considering the python frontend operations ?