How to gather a list of strings from different rank machines in DDP mode with NCCL backend?

Hi,
These days I’ve accelerated the training of models with DistributedDataParallel. NCCL is used as the backend of torch.distributed. Currently, I try to do validation with a list of strings stored in the memory. However, with the multi-process mechanism, it’s hard to share the list across different ranks than in DP mode. Is there any good way to solve the problem?

There was an PR to provide such a feature for general Python objects, but not landed yet. You can copy that code for now.

Thanks for your reply. This design is interesting. @mrshenli