Gathering dictionaries with NCCL for hard example mining

When hard example mining, it is important to keep track of the data indices to be able to set the proper weights in either the loss function or the sampler. For this purpose, my Dataset outputs a dictionary including the index that outputs the sample, e.g., {'image_index': idx, 'image': image, 'target': target}.

The collate function then merges the indexes to a Double tensor, so far so good. When I am now evaluating the training set in multi gpu setting, I store the loss and the indices in two dictionaries (as data types are different) and attempt to merge these dictionaries across these GPUs to GPU 0 which I can then use to compute proper weights for the next epoch.

However, NCCL does not seem to support gather. I get RuntimeError: ProcessGroupNCCL does not support gather I could copy the data to the CPU before gathering and use a different process group with gloo, but preferable I would want to keep these tensors on the GPU and only copy to the CPU when the complete evaluation is done. Is there a way around this so I can gather anyway (or another approach)?

1 Like

Could you accomplish that with all_gather? All GPUs will receive it, but you can process it only on rank-0 GPU.