I’m using contrastive learning to train categorization models. I tried to get better performance by increasing the batch size. Each GPU can process 64 batches at most, so I changed the number of Gpus from 4 to 16 and tried to increase the batch size from 256 to 1024. However, it was found in the experiment that increasing the number of Gpus could not increase the number of negative samples in a batch, because the batch processed by each GPU is always 64. For each GPU, the number of negative samples is always 63, which has nothing to do with the number of Gpus.
In DDP mode, how can we increase the number of negative samples?
For example, when the total batch size changes to 1024, the negative sample number should be 1023 instead of 63
zsnoob/EfficientDDP-4-Contrastive-Train: Optimizing the way of contrastive learning in PyTorch-DDP(DistributedDataParallel) multi-GPU training (github.com) There is a github repository about that question, and some optimization of your concern about “all machine are doing the same thing in the loss calculation”. Let’s discuss about it.