Hi.

I want to concat lists **with different lengths** across different gpus using `torch.distributed.launch`

. Is there any api like `torch.distributed.all_reduce()`

can help me?

Example Code (test.py):

```
import random
import torch
l = []
length = random.randint(5, 8)
for i in range(length):
l.append(i)
print(l)
```

Run:

```
python -m torch.distributed.launch \
--nproc_per_node=4 \
--use_env \
--master_port=$RANDOM \
test.py
```

Result:

```
[1, 2, ..., length in GPU 0]
[1, 2, ..., length in GPU 1]
[1, 2, ..., length in GPU 2]
[1, 2, ..., length in GPU 3]
```

What I want (concat/synchronize the list in 4 different gpus together):

```
[1, 2, ..., length in GPU 0, ..., length in GPU 1, ..., length in GPU 2, ..., length in GPU 3]
[1, 2, ..., length in GPU 0, ..., length in GPU 1, ..., length in GPU 2, ..., length in GPU 3]
[1, 2, ..., length in GPU 0, ..., length in GPU 1, ..., length in GPU 2, ..., length in GPU 3]
[1, 2, ..., length in GPU 0, ..., length in GPU 1, ..., length in GPU 2, ..., length in GPU 3]
```

Thanks!