PyTorch+Windows: is parallelization over multiple GPUs now possible?

SmoothPQ · November 4, 2019, 5:02pm

Hi,

I am planning to add a new GPU on my computer. Using Pytorch on Windows, I wonder if it will be possible for me to use parallelism.

In 2017 it wasn’t possible(https://github.com/pytorch/pytorch/issues/4391).
Has it changed? Will I have to switch to Linux? And is there a guide of how to install Pytorch for DataParallelism?

Thank you for all the library!

rvarm1 · November 5, 2019, 6:14pm

Hi, I am not aware of a NCCL binary from nvidia that supports windows, so parallelization over multiple GPUs on windows is still not possible.

SmoothPQ · November 6, 2019, 3:19pm

Thank you very much for the answer. Is nccl installed automatically when installing CUDA on Linux, or do I need to add something else?

MLnut · January 17, 2020, 1:02am

Im also on a Windows system. I was able to use dataparallel on my model without any apparent errors. However, the performance was actually worse; which makes me think that it’s not actually using multiple gpus. Why am I able to use multiple gpus in tensorflow on a windows system, but not pytorch? There must be some hack to get be able to do this.

mrshenli · January 17, 2020, 8:12pm

On Linux, NCCL and torch.distributed will be enabled by default. On MacOs, with PyTorch 1.3.1+, you need to conda install libuv and pkg-config explicitly set USE_DISTRIBUTED=1 when compiling from source. For Windows, torch.distributed is not enabled yet.

mrshenli · January 17, 2020, 8:15pm

I was able to use dataparallel on my model without any apparent errors. However, the performance was actually worse; which makes me think that it’s not actually using multiple gpus.

DataParallel is single-process-multi-thread data parallelism, and it replicates the input module in every forward pass, which is expected to be slow, but this is a very convenient entry point for enabling parallelism. You don’t need to do anything to enable that, and it should work fine if the batch size is large enough (to shadow the model replicating overhead)

Why am I able to use multiple gpus in tensorflow on a windows system

We are working on using libuv to enable that, as @pietern did for Windows, but timeline is TBD