When DataParallel is used for traning how to know the ground truths of data batches on different GPUs?

sarvani · July 9, 2021, 2:26am

In my code, every hidden layer output and the ground truths of that mini-batches are used for estimating a quantity in every iteration. This works fine when single GPU is used (say batch_size =32). As every hidden layer output and ground truths would be of same number i.e, 32

But when I use datatparallel with batch_size=64 on 2 GPUs, I have the grounds of size 64, but outputs from hidden layers are of size 32 from each of the 2 gpus.

How to know how the 64 images are divided onto 2 GPUs, so that I can get the ground truths of size 32 to use it with outputs from every hidden layer.

(Previously i was passing the entire ground truth values to forward hooks created on each layer)

J_Johnson · July 9, 2021, 2:52am

What do you mean by “ground truths”?

Also, have you tried using the Model Parallel method? That will divide the model layers between GPUs instead of batches.

https://pytorch.org/tutorials/intermediate/model_parallel_tutorial.html

sarvani · July 9, 2021, 3:13am

Ground truths mean the original labels for a given mini-batch of images.
I did not try model parallel , i will try it.

J_Johnson · July 9, 2021, 4:07pm

Ah. Gotcha. Can you post your model here? It’s tough to identify the exact issue with your hidden layer size output without seeing the structure.