Deep learning model accuracy change when number of training GPUs changes

Hello everyone,

I’m new to PyTorch and deep learning, and I have a question regarding an action recognition model (supervised) using Graph Convolutional Network (GCN). In two experiments, where I trained the model and tested it with all parameters and the dataset constant, I observed a significant accuracy difference. Specifically, with 2 GPUs, I got a top-1 accuracy of 89.06, and with 4 GPUs, I obtained a top-1 accuracy of 89.96.

I want to highlight that I initialize the weights of the model using a random seed to ensure result reproducibility. Despite my efforts to identify the cause of this accuracy difference, I haven’t been successful. Has anyone else encountered a similar issue or have any insights into what might be happening?

I’m conducting these experiments on:

  • GPU: Tesla P100-PCIE-16GB
  • CUDA version: 12.2
  • torch version: 1.7.1+cu110
  • Utilizing DataParallel with torch.nn.DataParallel

Any experiences or ideas would be greatly appreciated. Thanks!