WeightedRandomSampler has low data loading efficiency

Dear groupers,
I work on an unbalanced dataset. I’ve encountered a strange phenomenon regarding WeightedRandomSampler. In my Docker environment with two RTX 3090 GPUs and 256 CPU cores, when I run the code using WeightedRandomSampler for sampling individually, it runs at a normal speed. However, when I run the program using both GPUs simultaneously, the data loading rate is very low, and the GPUs are constantly waiting for data input. I’m not sure how to solve this issue. I tried distributed training, but it seems that WeightedRandomSampler cannot be used for sampling in distributed training.
Can anyone help me to check my problems? Thanks a lot!

The following is the configuration I used for WeightedRandomSampler.

sampler = WeightedRandomSampler(
                weights=weights,
                num_samples=len(train_dataset),
                replacement=True,
                generator=sampling_generator, 
            )

Best!
CYYJL

This topic might be helpful asking about a weighted and distributed sampler.