CPU inputs to DataParallel

Hello all,

I’m wonding if DataParallel will be updated to take CPU inputs in an upcoming release, or if there is a workaround that doesn’t involve moving the whole batch to a GPU before a forward pass.

I’m not using the most up-to-date release, so maybe my concern is antiquated, but right now when I call DataParallel module on a CPU input I get the error:

“Broadcast function not implemented for CPU tensors”

This limitation can become a big problem when running on several GPUs at a time because with the entire batch on a single GPU, that GPU will hit a memory error much sooner than the others, and so the entire batch size has to be reduced to accommodate.

Thank you for your time.

Yes, it will be higher on one GPU, but only slightly. The amount of memory needed to store inputs and coalcse is usually small compared to the amount needed for autograd.