How to link a custom NCCL version

OasisArtisan · December 29, 2020, 11:32pm

So I’m working on a project where I had to modify NCCL a bit to serve my purpose.

Now my question is how would I force pytorch to use my version of NCCL?

To start with, Is NCCL dynamically linked so pytorch would automatically link to any version of NCCL available? or is it statically linked that I need to recompile Pytorch with my custom NCCL version?

Any pointers or tips are appreciated.

Cheers

osalpekar · December 29, 2020, 11:45pm

@OasisArtisan PyTorch has a specific version of NCCL as a submodule. If you want to use a different version of NCCL, you can rebuild PyTorch with the USE_SYSTEM_NCCL flag.

Here’s a similar forums question: NCC version and Pytorch NCCL version mismatch

OasisArtisan · December 30, 2020, 12:10am

I see. And if i set the USE_SYSTEM_NCCL flag, then would NCCL be linked dynamically or statically to pytorch?

To illustrate my intention, I want to know that if I need to recompile pytorch everytime I change something in my custom NCCL version. If its linked dynamically, then as long as I keep the same NCCL interface I do not need to recompile PyTorch.

osalpekar · December 30, 2020, 3:59am

I believe it’s dynamically linked, but it seems that can be toggled with USE_STATIC_NCCL.

OasisArtisan · December 30, 2020, 4:43am

I see thanks a lot.

Cheers

OasisArtisan · January 3, 2021, 9:12am

If I can have some follow up questions…

First, I implicitly understood that if PyTorch was using its own NCCL submodule then it is linking to it statically. Is my understanding correct?

Second, is there a way to know what NCCL compilation flags were used to produce the PyTorch binaries installed by Conda?

Thanks again.

ptrblck · January 13, 2021, 9:15am

You can see here that NCCL is statically linked to the binaries and can take a look at the repository for more information about the build process.