In DDP mode, how to increase the number of negative samples in contrast learning?

skye95git · July 22, 2022, 4:07pm

I’m using contrastive learning to train categorization models. I tried to get better performance by increasing the batch size. Each GPU can process 64 batches at most, so I changed the number of Gpus from 4 to 16 and tried to increase the batch size from 256 to 1024. However, it was found in the experiment that increasing the number of Gpus could not increase the number of negative samples in a batch, because the batch processed by each GPU is always 64. For each GPU, the number of negative samples is always 63, which has nothing to do with the number of Gpus.
In DDP mode, how can we increase the number of negative samples?
For example, when the total batch size changes to 1024, the negative sample number should be 1023 instead of 63

anj · July 26, 2022, 1:13pm

Can you try increasing the batch size to 1024 with 4 GPUs which should increase batch size/GPU to 256? That would increase the number of samples seen by each GPU. Is that what you are trying to accomplish?

cc @agu @rvarm1

zhenxz0121 · December 29, 2023, 4:44pm

zsnoob/EfficientDDP-4-Contrastive-Train: Optimizing the way of contrastive learning in PyTorch-DDP(DistributedDataParallel) multi-GPU training (github.com) There is a github repository about that question, and some optimization of your concern about “all machine are doing the same thing in the loss calculation”. Let’s discuss about it.