UCC communication layer

DO you know, how to build PyTorch with UCC enabled? I want to use ProcessGroupUCC with UCC tracing enabled.

In your source build you could add these env variables to build and use your system UCC lib:

USE_UCC=1 \
    USE_SYSTEM_UCC=1 \
    UCC_HOME="/opt/hpcx/ucc" \
    UCC_DIR="/opt/hpcx/ucc/lib/cmake/ucc" \
    UCX_HOME="/opt/hpcx/ucx" \
    UCX_DIR="/opt/hpcx/ucx/lib/cmake/ucx" \
    python setup.py install

You might need to change the paths of course.

Thanks, this is similar settings I have tried to do. With your settings I have the same outcome (I use Ubuntu 20.04)

Note no UCC is compiled:

[5809/6884] Building CXX object caffe2/CMakeFiles/torch_cpu.dir//torch/csrc/distributed/c10d/Ops.cpp.o
[5810/6884] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/
/torch/csrc/distributed/c10d/Store.cpp.o
[5811/6884] Building CXX object caffe2/CMakeFiles/torch_cpu.dir//torch/csrc/distributed/c10d/ProcessGroup.cpp.o
[5812/6884] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/
/torch/csrc/distributed/c10d/ProcessGroupGloo.cpp.o
[5813/6884] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/ProcessGroupMPI.cpp.o

Build fails with following error:
FAILED: caffe2/CMakeFiles/torch_cuda.dir//aten/src/ATen/cuda/CUDAGraph.cpp.o
/usr/bin/c++ -DAT_PER_OPERATOR_HEADERS -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DIDEEP_USE_MKL -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DTORCH_CUDA_BUILD_MAIN_LIB -DUSE_C10D_GLOO -DUSE_C10D_MPI -DUSE_C10D_NCCL -DUSE_C10D_UCC -DUSE_CUDA -DUSE_DISTRIBUTED -DUSE_EXPERIMENTAL_CUDNN_V8_API -DUSE_EXTERNAL_MZCRC -DUSE_NCCL -DUSE_RPC -DUSE_TENSORPIPE -DUSE_UCC -D_FILE_OFFSET_BITS=64 -Dtorch_cuda_EXPORTS -I/home/asidorenko/dev/ML/pytorch/build/aten/src -
DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow -O3 -DNDEBUG -DNDEBUG -fPIC -DCAFFE2_USE_GLOO -DTH_HAVE_THREAD -Wall -Wextra -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-missing-field-initializers -Wno-write-strings -Wno-unknown-pragmas -Wno-type-limits -Wno-array-bounds -Wno-sign-compare -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-missing-braces -Wno-maybe-uninitialized -fvisibility=hidden -O2 -DTORCH_CUDA_BUILD_MAIN_LIB -pthread -MD -MT caffe2/CMakeFiles/torch_cuda.dir/
/aten/src/ATen/cuda/CUDAGraph.cpp.o -MF caffe2/CMakeFiles/torch_cuda.dir//aten/src/ATen/cuda/CUDAGraph.cpp.o.d -o caffe2/CMakeFiles/torch_cuda.dir//aten/src/ATen/cuda/CUDAGraph.cpp.o -c /home/asidorenko/dev/ML/pytorch/aten/src/ATen/cuda/CUDAGraph.cpp
In file included from /home/asidorenko/dev/ML/pytorch/c10/cuda/CUDAFunctions.h:12,
from /home/asidorenko/dev/ML/pytorch/c10/cuda/CUDAStream.h:10,
from /home/asidorenko/dev/ML/pytorch/c10/cuda/CUDAGraphsC10Utils.h:3,
from /home/asidorenko/dev/ML/pytorch/aten/src/ATen/cuda/CUDAGraph.h:5,
from /home/asidorenko/dev/ML/pytorch/aten/src/ATen/cuda/CUDAGraph.cpp:2:
/home/asidorenko/dev/ML/pytorch/aten/src/ATen/cuda/CUDAGraph.cpp: In member function ‘void at::cuda::CUDAGraph::debug_dump(const string&)’:
/home/asidorenko/dev/ML/pytorch/aten/src/ATen/cuda/CUDAGraph.cpp:255:27: error: ‘cudaGraphDebugDotPrint’ was not declared in this scope
255 | C10_CUDA_CHECK_WARN(cudaGraphDebugDotPrint(graph_, debug_path.c_str(), 1<<10)); // most verbose output
| ^~~~~~~~~~~~~~~~~~~~~~
/home/asidorenko/dev/ML/pytorch/c10/cuda/CUDAException.h:40:31: note: in definition of macro ‘C10_CUDA_CHECK_WARN’
40 | const cudaError_t err = EXPR;
| ^~~~
[5984/6884] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/
/aten/src/ATen/cuda/CuSparseHandlePool.cpp.o