Evaluation doesn't make sense when using 8 gpus with 1 batch-size and Data Parallelism

Hi everyone,

I’m going to evaluate a model that behave like this:

  • Takes images as an input
  • Uses last frame data, so it should behave sequentially frame by frame. You can imagine it is like a video.
  • Using data parallelism (nn.parallel.dataparallelism). I have 8 gpus.
  • I use shuffle = False for the DataLoader

So, I want to evaluate that model using batch_size = 1, because I will only have 1 gpu after deployment. I tried to evaluate by using “CUDA_VISIBLE_DEVICES=0” and got:

First Frame:
tensor([ 8.4655,  8.5176,  8.4897,  ..., 49.1040, 49.0972, 49.1562],
       device='cuda:0')

Second Frame:
tensor([ 8.2219,  7.8709,  8.1458,  ..., 50.7412, 50.6144, 50.0625],
       device='cuda:0')

but If I tried to evaluate the model with 8 gpus, it produces different output, especially in the second frame.

First frame:
tensor([ 8.4655,  8.5176,  8.4897,  ..., 49.1040, 49.0972, 49.1562],
       device='cuda:0')

Second frame:
tensor([ 8.6638,  8.3508,  8.6230,  ..., 50.7001, 50.6111, 50.0625],
       device='cuda:0')

eventhough that when I checked the nvidia-smi, the only gpu that is utilizing is only gpu 0.
Furthermore, I tried to delete “model = nn.parallel.DataParallel(model)” this line, and both of the trials produce the same result.

Does anyone here know what happened here? And is there any better ways to train or evaluating the data sequentially?

@robertsenps I think for your use case it’s not a data parallelism pattern because you want to evaluate sequentially, so better not use data parallel wrapper