When I use DataParalle, I find the first dim of outputs is batch_size * gpu_nums, and when I calculate the loss, the error coming
ValueError: Expected input batch_size (32) to match target batch_size (16).
my code is:
model = DataParallel(model, device_ids=gpus, output_device=gpus)
outputs = model(input_ids, token_type_ids, attention_mask)
loss = loss_fct(outputs, labels.cuda(config.cuda_id))
I think the first dim of outputs should be batch_size
How to fix this, anybody can help me? thx
Thanks! Checking some other sources: ValueError: Expected input batch_size (1) to match target batch_size (64) and https://stackoverflow.com/questions/56719867/pytorch-expected-input-batch-size-12-to-match-target-batch-size-64, and ValueError: Expected input batch_size (324) to match target batch_size (4), there is likely a bug in how you’ve defined the shapes in the implementation of your forward pass.
If your model works without DataParallel but breaks with it, it’s likely due to your model implicitly hardcoding a specific batch size it expects, likely in the beginning of the forward pass (maybe somwhere in