I have a model that I would like to parallelize so that inference would be much faster. The input vector that I have to this network is at one point 195xHxW. My network then reshapes it -1x3xHxW, which should normally work (65x3xHxW). But because DataParallel wraps the network it divides the 195xHxW tensor into n pieces, where n is the amount of gpus. However, when dividing the tensor it does it in such a way that the last tensor can no longer be reshaped. Is there a way to get DataParallel to work with the model so that the tensor can still be properly reshaped?

I’m assuming that since your tensor of shape 195*H*W is a single training example, you don’t want it to be split by N GPUs, since 195 is not the batch size in this case? If you pass in your training examples in the form such as batch_size * C * H * W for example then DataParallel/DDP should divide along the batch size dim.

Could you potentially paste a reproduction of this issue and the associated error message that you get?

Btw, if you are using utilities provided by PyTorch such as the DataLoader you can configure the `drop_last`

argument to ensure that batch sizes are even.