I’m wonding if DataParallel will be updated to take CPU inputs in an upcoming release, or if there is a workaround that doesn’t involve moving the whole batch to a GPU before a forward pass.
I’m not using the most up-to-date release, so maybe my concern is antiquated, but right now when I call DataParallel module on a CPU input I get the error:
“Broadcast function not implemented for CPU tensors”
This limitation can become a big problem when running on several GPUs at a time because with the entire batch on a single GPU, that GPU will hit a memory error much sooner than the others, and so the entire batch size has to be reduced to accommodate.
Thank you for your time.