DataParallel, Expected input batch_size (64) to match target batch_size (32)

zeng · June 30, 2018, 4:38am

model = nn.DataParallel(model, device_ids=[0, 1])

context, ctx_length = batch.context
response, rsp_length = batch.response
label = batch.label

prediction = self.model(context, response)
loss = self.criterion(prediction, label)

the batch size is 32, the size of prediction is 32 * 2 = 64, but the size of the label is still 32, which cause the criterion to raise Error.

What’s the problem?

Shani_Gamrian · June 30, 2018, 2:08pm

How come the predictions doesn’t match the labels? you should set the output of the prediction to 1 (will be 32 if the batch size is 32).

zeng · June 30, 2018, 2:11pm

Acctually, the size of prediction is [64, 2], however, the size of label is [32]. That’s what I am confused. When I change device_ids=[0], the size of predication become [32, 2].

Shani_Gamrian · June 30, 2018, 2:18pm

I don’t know what model you’re using so it’s hard for me to help you, but to find the problem I would probably change the batch size to 1 and see what happens, just to make sure that the model’s output size is really 1.

Aayushee_Gupta · January 27, 2019, 5:14am

Hi. Were you able to resolve this?
I am also using DataParallel to use multiple GPUs in Pytorch script but face a similar error with batch size 64 and 4 GPUs:

ValueError: Expected input batch_size (256) to match target batch_size (64)

sdeva14 · February 21, 2019, 4:53pm

It is too late to answer, but I leave the answer here for other people.
I recently had the same problem for multi-gpu usage.

Problem: It seems like the problem is that we need to pass Tensor-type variable to model.forward for DataParallel.
Otherwise, it is not going to be scattered, but copied as the number of gpu.

Solution: easy solution is to pass Tensor variable to model.forward, instead of type casting in the forward function.
Another solution is to return loss from the forward function, then normalize loss by the batch size.

Please see here for detail about DataParallel

github.com

pytorch/pytorch/blob/master/torch/nn/parallel/data_parallel.py

import operator
import torch
import warnings
from itertools import chain
from ..modules import Module
from .scatter_gather import scatter_kwargs, gather
from .replicate import replicate
from .parallel_apply import parallel_apply
from torch.cuda._utils import _get_device_index


def _check_balance(device_ids):
    imbalance_warn = """
    There is an imbalance between your GPUs. You may want to exclude GPU {} which
    has less than 75% of the memory or cores of GPU {}. You can do so by setting
    the device_ids argument to DataParallel, or by setting the CUDA_VISIBLE_DEVICES
    environment variable."""
    device_ids = list(map(lambda x: _get_device_index(x, True), device_ids))
    dev_props = [torch.cuda.get_device_properties(i) for i in device_ids]

This file has been truncated. show original

harsh_grover · June 19, 2020, 11:33am

@sdeva14 Hey can you please post the first solution how to do that, it is not becoming clear to me.

Thanks in advance

harsh_grover · June 19, 2020, 11:34am

@Aayushee_Gupta Hey, did you resolve this?

harsh_grover · June 19, 2020, 11:36am

@zeng Hey, how did you resolve this?