In my code, every hidden layer output and the ground truths of that mini-batches are used for estimating a quantity in every iteration. This works fine when single GPU is used (say batch_size =32). As every hidden layer output and ground truths would be of same number i.e, 32
But when I use datatparallel with batch_size=64 on 2 GPUs, I have the grounds of size 64, but outputs from hidden layers are of size 32 from each of the 2 gpus.
How to know how the 64 images are divided onto 2 GPUs, so that I can get the ground truths of size 32 to use it with outputs from every hidden layer.
(Previously i was passing the entire ground truth values to forward hooks created on each layer)