When building from the source it throws up nvlink error. I have followed these instructions.
[386/3538] Performing build step for 'nccl_external'
FAILED: nccl_external-prefix/src/nccl_external-stamp/nccl_external-build nccl/lib/libnccl_static.a
cd /codehub/external/pytorch/third_party/nccl/nccl && env CCACHE_DISABLE=1 SCCACHE_DISABLE=1 make CXX=/usr/bin/c++ CUDA_HOME=/usr/local/cuda NVCC=/usr/local/cuda/bin/nvcc "NVCC_GENCODE=-gencode=arch=compute_35,code=sm_35 -gencode=arch=compute_37,code=sm_37 -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_53,code=sm_53 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_75,code=sm_75" BUILDDIR=/codehub/external/pytorch/build/nccl VERBOSE=0 -j && /root/anaconda3/bin/cmake -E touch /codehub/external/pytorch/build/nccl_external-prefix/src/nccl_external-stamp/nccl_external-build
make -C src build BUILDDIR=/codehub/external/pytorch/build/nccl
make[1]: Entering directory '/codehub/external/pytorch/third_party/nccl/nccl/src'
Generating nccl.h.in > /codehub/external/pytorch/build/nccl/include/nccl.h
Grabbing include/nccl_net.h > /codehub/external/pytorch/build/nccl/include/nccl_net.h
Compiling init.cc > /codehub/external/pytorch/build/nccl/obj/init.o
Compiling channel.cc > /codehub/external/pytorch/build/nccl/obj/channel.o
Compiling bootstrap.cc > /codehub/external/pytorch/build/nccl/obj/bootstrap.o
Compiling transport.cc > /codehub/external/pytorch/build/nccl/obj/transport.o
Compiling enqueue.cc > /codehub/external/pytorch/build/nccl/obj/enqueue.o
Compiling misc/group.cc > /codehub/external/pytorch/build/nccl/obj/misc/group.o
Compiling misc/nvmlwrap.cc > /codehub/external/pytorch/build/nccl/obj/misc/nvmlwrap.o
Compiling misc/ibvwrap.cc > /codehub/external/pytorch/build/nccl/obj/misc/ibvwrap.o
Compiling misc/rings.cc > /codehub/external/pytorch/build/nccl/obj/misc/rings.o
Compiling misc/utils.cc > /codehub/external/pytorch/build/nccl/obj/misc/utils.o
Compiling misc/argcheck.cc > /codehub/external/pytorch/build/nccl/obj/misc/argcheck.o
Compiling misc/trees.cc > /codehub/external/pytorch/build/nccl/obj/misc/trees.o
Compiling misc/topo.cc > /codehub/external/pytorch/build/nccl/obj/misc/topo.o
Compiling transport/p2p.cc > /codehub/external/pytorch/build/nccl/obj/transport/p2p.o
Compiling transport/shm.cc > /codehub/external/pytorch/build/nccl/obj/transport/shm.o
Compiling transport/net.cc > /codehub/external/pytorch/build/nccl/obj/transport/net.o
Compiling transport/net_socket.cc > /codehub/external/pytorch/build/nccl/obj/transport/net_socket.o
In file included from bootstrap.cc:12:
include/socket.h: In function ‘ncclResult_t connectAddress(int*, socketAddress*)’:
include/socket.h:41:16: warning: ‘<’ directive writing 1 byte into a region of size between 0 and 1024 [-Wformat-overflow=]
sprintf(buf, "%s<%s>", host, service);
^~~~~~~~
In file included from /usr/include/stdio.h:862,
from include/debug.h:11,
from include/core.h:13,
from bootstrap.cc:8:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:33:34: note: ‘__builtin___sprintf_chk’ output between 3 and 1058 bytes into a destination of size 1024
return __builtin___sprintf_chk (__s, __USE_FORTIFY_LEVEL - 1,
~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
__bos (__s), __fmt, __va_arg_pack ());
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Compiling transport/net_ib.cc > /codehub/external/pytorch/build/nccl/obj/transport/net_ib.o
Compiling collectives/all_reduce.cc > /codehub/external/pytorch/build/nccl/obj/collectives/all_reduce.o
In file included from bootstrap.cc:12:
include/socket.h: In function ‘int findInterfaceMatchSubnet(char*, socketAddress*, socketAddress, int, int)’:
include/socket.h:41:16: warning: ‘<’ directive writing 1 byte into a region of size between 0 and 1024 [-Wformat-overflow=]
sprintf(buf, "%s<%s>", host, service);
^~~~~~~~
In file included from /usr/include/stdio.h:862,
from include/debug.h:11,
from include/core.h:13,
from bootstrap.cc:8:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:33:34: note: ‘__builtin___sprintf_chk’ output between 3 and 1058 bytes into a destination of size 1024
return __builtin___sprintf_chk (__s, __USE_FORTIFY_LEVEL - 1,
~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
__bos (__s), __fmt, __va_arg_pack ());
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Compiling collectives/all_gather.cc > /codehub/external/pytorch/build/nccl/obj/collectives/all_gather.o
Compiling collectives/broadcast.cc > /codehub/external/pytorch/build/nccl/obj/collectives/broadcast.o
Compiling collectives/reduce.cc > /codehub/external/pytorch/build/nccl/obj/collectives/reduce.o
Compiling collectives/reduce_scatter.cc > /codehub/external/pytorch/build/nccl/obj/collectives/reduce_scatter.o
make[2]: Entering directory '/codehub/external/pytorch/third_party/nccl/nccl/src/collectives/device'
Generating nccl.pc.in > /codehub/external/pytorch/build/nccl/lib/pkgconfig/nccl.pc
Generating rules > /codehub/external/pytorch/build/nccl/obj/collectives/device/Makefile.rules
In file included from bootstrap.cc:12:
include/socket.h: In function ‘ncclResult_t bootstrapNetInit()’:
include/socket.h:41:16: warning: ‘<’ directive writing 1 byte into a region of size between 0 and 1024 [-Wformat-overflow=]
sprintf(buf, "%s<%s>", host, service);
^~~~~~~~
In file included from /usr/include/stdio.h:862,
from include/debug.h:11,
from include/core.h:13,
from bootstrap.cc:8:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:33:34: note: ‘__builtin___sprintf_chk’ output between 3 and 1058 bytes into a destination of size 1024
return __builtin___sprintf_chk (__s, __USE_FORTIFY_LEVEL - 1,
~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
__bos (__s), __fmt, __va_arg_pack ());
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
bootstrap.cc:44:58: warning: ‘%s’ directive output may be truncated writing up to 1023 bytes into a region of size between 1017 and 1018 [-Wformat-truncation=]
snprintf(line+strlen(line), 1023-strlen(line), " [%d]%s:%s", i, bootstrapNetIfNames+i*MAX_IF_NAME_SIZE,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
socketToString(&bootstrapNetIfAddrs[i].sa, addrline));
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/include/stdio.h:862,
from include/debug.h:11,
from include/core.h:13,
from bootstrap.cc:8:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:64:35: note: ‘__builtin___snprintf_chk’ output 6 or more bytes (assuming 1030) into a destination of size 1023
return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
__bos (__s), __fmt, __va_arg_pack ());
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from transport/net_socket.cc:9:
include/socket.h: In function ‘ncclResult_t connectAddress(int*, socketAddress*)’:
include/socket.h:41:16: warning: ‘<’ directive writing 1 byte into a region of size between 0 and 1024 [-Wformat-overflow=]
sprintf(buf, "%s<%s>", host, service);
^~~~~~~~
In file included from /usr/include/stdio.h:862,
from include/debug.h:11,
from include/core.h:13,
from transport/net_socket.cc:8:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:33:34: note: ‘__builtin___sprintf_chk’ output between 3 and 1058 bytes into a destination of size 1024
return __builtin___sprintf_chk (__s, __USE_FORTIFY_LEVEL - 1,
~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
__bos (__s), __fmt, __va_arg_pack ());
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from transport/net_ib.cc:9:
include/socket.h: In function ‘ncclResult_t ncclIbInit(ncclDebugLogger_t)’:
include/socket.h:41:16: warning: ‘<’ directive writing 1 byte into a region of size between 0 and 1024 [-Wformat-overflow=]
sprintf(buf, "%s<%s>", host, service);
^~~~~~~~
In file included from /usr/include/stdio.h:862,
from include/debug.h:11,
from include/core.h:13,
from transport/net_ib.cc:8:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:33:34: note: ‘__builtin___sprintf_chk’ output between 3 and 1058 bytes into a destination of size 1024
return __builtin___sprintf_chk (__s, __USE_FORTIFY_LEVEL - 1,
~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
__bos (__s), __fmt, __va_arg_pack ());
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from transport/net_ib.cc:9:
include/socket.h:41:16: warning: ‘<’ directive writing 1 byte into a region of size between 0 and 1024 [-Wformat-overflow=]
sprintf(buf, "%s<%s>", host, service);
^~~~~~~~
In file included from /usr/include/stdio.h:862,
from include/debug.h:11,
from include/core.h:13,
from transport/net_ib.cc:8:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:33:34: note: ‘__builtin___sprintf_chk’ output between 3 and 1058 bytes into a destination of size 1024
return __builtin___sprintf_chk (__s, __USE_FORTIFY_LEVEL - 1,
~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
__bos (__s), __fmt, __va_arg_pack ());
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from transport/net_socket.cc:9:
include/socket.h: In function ‘ncclResult_t ncclSocketInit(ncclDebugLogger_t)’:
include/socket.h:41:16: warning: ‘<’ directive writing 1 byte into a region of size between 0 and 1024 [-Wformat-overflow=]
sprintf(buf, "%s<%s>", host, service);
^~~~~~~~
In file included from /usr/include/stdio.h:862,
from include/debug.h:11,
from include/core.h:13,
from transport/net_socket.cc:8:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:33:34: note: ‘__builtin___sprintf_chk’ output between 3 and 1058 bytes into a destination of size 1024
return __builtin___sprintf_chk (__s, __USE_FORTIFY_LEVEL - 1,
~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
__bos (__s), __fmt, __va_arg_pack ());
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from transport/net_socket.cc:9:
include/socket.h:41:16: warning: ‘<’ directive writing 1 byte into a region of size between 0 and 1024 [-Wformat-overflow=]
sprintf(buf, "%s<%s>", host, service);
^~~~~~~~
In file included from /usr/include/stdio.h:862,
from include/debug.h:11,
from include/core.h:13,
from transport/net_socket.cc:8:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:33:34: note: ‘__builtin___sprintf_chk’ output between 3 and 1058 bytes into a destination of size 1024
return __builtin___sprintf_chk (__s, __USE_FORTIFY_LEVEL - 1,
~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
__bos (__s), __fmt, __va_arg_pack ());
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
transport/net_socket.cc:40:58: warning: ‘%s’ directive output may be truncated writing up to 1023 bytes into a region of size between 1017 and 1018 [-Wformat-truncation=]
snprintf(line+strlen(line), 1023-strlen(line), " [%d]%s:%s", i, ncclNetIfNames+i*MAX_IF_NAME_SIZE,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
socketToString(&ncclNetIfAddrs[i].sa, addrline));
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/include/stdio.h:862,
from include/debug.h:11,
from include/core.h:13,
from transport/net_socket.cc:8:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:64:35: note: ‘__builtin___snprintf_chk’ output 6 or more bytes (assuming 1030) into a destination of size 1023
return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
__bos (__s), __fmt, __va_arg_pack ());
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from transport/net_ib.cc:9:
include/socket.h: In function ‘ncclResult_t ncclIbConnect(int, void*, void**)’:
include/socket.h:41:16: warning: ‘<’ directive writing 1 byte into a region of size between 0 and 1024 [-Wformat-overflow=]
sprintf(buf, "%s<%s>", host, service);
^~~~~~~~
In file included from /usr/include/stdio.h:862,
from include/debug.h:11,
from include/core.h:13,
from transport/net_ib.cc:8:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:33:34: note: ‘__builtin___sprintf_chk’ output between 3 and 1058 bytes into a destination of size 1024
return __builtin___sprintf_chk (__s, __USE_FORTIFY_LEVEL - 1,
~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
__bos (__s), __fmt, __va_arg_pack ());
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Compiling all_gather.cu > /codehub/external/pytorch/build/nccl/obj/collectives/device/all_gather_sum_i8.o
Compiling all_gather.cu > /codehub/external/pytorch/build/nccl/obj/collectives/device/all_gather_sum_u8.o
Compiling all_gather.cu > /codehub/external/pytorch/build/nccl/obj/collectives/device/all_gather_sum_i32.o
Compiling all_gather.cu > /codehub/external/pytorch/build/nccl/obj/collectives/device/all_gather_sum_u32.o
Compiling all_gather.cu > /codehub/external/pytorch/build/nccl/obj/collectives/device/all_gather_sum_i64.o
Compiling all_gather.cu > /codehub/external/pytorch/build/nccl/obj/collectives/device/all_gather_sum_u64.o
Compiling all_gather.cu > /codehub/external/pytorch/build/nccl/obj/collectives/device/all_gather_sum_f16.o
Compiling all_gather.cu > /codehub/external/pytorch/build/nccl/obj/collectives/device/all_gather_sum_f32.o
Compiling all_gather.cu > /codehub/external/pytorch/build/nccl/obj/collectives/device/all_gather_sum_f64.o
Compiling all_gather.cu > /codehub/external/pytorch/build/nccl/obj/collectives/device/all_gather_prod_i8.o
Compiling all_gather.cu > /codehub/external/pytorch/build/nccl/obj/collectives/device/all_gather_prod_u8.o
Compiling all_gather.cu > /codehub/external/pytorch/build/nccl/obj/collectives/device/all_gather_prod_i32.o
Compiling all_gather.cu > /codehub/external/pytorch/build/nccl/obj/collectives/device/all_gather_prod_u32.o
Compiling all_gather.cu > /codehub/external/pytorch/build/nccl/obj/collectives/device/all_gather_prod_i64.o
Compiling all_gather.cu > /codehub/external/pytorch/build/nccl/obj/collectives/device/all_gather_prod_u64.o
Compiling all_gather.cu > /codehub/external/pytorch/build/nccl/obj/collectives/device/all_gather_prod_f16.o
Compiling all_gather.cu > /codehub/external/pytorch/build/nccl/obj/collectives/device/all_gather_prod_f32.o
Compiling all_gather.cu > /codehub/external/pytorch/build/nccl/obj/collectives/device/all_gather_prod_f64.o
.
.
.
.
.
Compiling all_reduce.cu > /codehub/external/pytorch/build/nccl/obj/collectives/device/all_reduce_min_f16.o
Compiling all_reduce.cu > /codehub/external/pytorch/build/nccl/obj/collectives/device/all_reduce_min_f32.o
Compiling all_reduce.cu > /codehub/external/pytorch/build/nccl/obj/collectives/device/all_reduce_min_f64.o
Compiling all_reduce.cu > /codehub/external/pytorch/build/nccl/obj/collectives/device/all_reduce_max_i8.o
Compiling all_reduce.cu > /codehub/external/pytorch/build/nccl/obj/collectives/device/all_reduce_max_u8.o
Compiling all_reduce.cu > /codehub/external/pytorch/build/nccl/obj/collectives/device/all_reduce_max_i32.o
Compiling all_reduce.cu > /codehub/external/pytorch/build/nccl/obj/collectives/device/all_reduce_max_u32.o
Compiling all_reduce.cu > /codehub/external/pytorch/build/nccl/obj/collectives/device/all_reduce_max_i64.o
Compiling all_reduce.cu > /codehub/external/pytorch/build/nccl/obj/collectives/device/all_reduce_max_u64.o
Compiling all_reduce.cu > /codehub/external/pytorch/build/nccl/obj/collectives/device/all_reduce_max_f16.o
Compiling all_reduce.cu > /codehub/external/pytorch/build/nccl/obj/collectives/device/all_reduce_max_f32.o
Compiling all_reduce.cu > /codehub/external/pytorch/build/nccl/obj/collectives/device/all_reduce_max_f64.o
nvlink error : entry function '_Z37ncclReduceScatterTreeLLKernel_sum_f648ncclColl' with max regcount of 80 calls function '_Z24ncclAllReduceRing_sum_i8P14CollectiveArgs' with regcount of 96 (target: sm_53)
nvlink error : entry function '_Z37ncclReduceScatterRingLLKernel_sum_f648ncclColl' with max regcount of 80 calls function '_Z24ncclAllReduceRing_sum_i8P14CollectiveArgs' with regcount of 96 (target: sm_53)
nvlink error : entry function '_Z37ncclReduceScatterTreeLLKernel_sum_f328ncclColl' with max regcount of 80 calls function '_Z24ncclAllReduceRing_sum_i8P14CollectiveArgs' with regcount of 96 (target: sm_53)
nvlink error : entry function '_Z37ncclReduceScatterRingLLKernel_sum_f328ncclColl' with max regcount of 80 calls function '_Z24ncclAllReduceRing_sum_i8P14CollectiveArgs' with regcount of 96 (target: sm_53)
nvlink error : entry function '_Z37ncclReduceScatterTreeLLKernel_sum_f168ncclColl' with max regcount of 80 calls function '_Z24ncclAllReduceRing_sum_i8P14CollectiveArgs' with regcount of 96 (target: sm_53)
nvlink error : entry function '_Z37ncclReduceScatterRingLLKernel_sum_f168ncclColl' with max regcount of 80 calls function '_Z24ncclAllReduceRing_sum_i8P14CollectiveArgs' with regcount of 96 (target: sm_53)
nvlink error : entry function '_Z37ncclReduceScatterTreeLLKernel_sum_u648ncclColl' with max regcount of 80 calls function '_Z24ncclAllReduceRing_sum_i8P14CollectiveArgs' with regcount of 96 (target: sm_53)
nvlink error : entry function '_Z37ncclReduceScatterRingLLKernel_sum_u648ncclColl' with max regcount of 80 calls function '_Z24ncclAllReduceRing_sum_i8P14CollectiveArgs' with regcount of 96 (target: sm_53)
nvlink error : entry function '_Z37ncclReduceScatterTreeLLKernel_sum_i648ncclColl' with max regcount of 80 calls function '_Z24ncclAllReduceRing_sum_i8P14CollectiveArgs' with regcount of 96 (target: sm_53)
nvlink error : entry function '_Z37ncclReduceScatterRingLLKernel_sum_i648ncclColl' with max regcount of 80 calls function '_Z24ncclAllReduceRing_sum_i8P14CollectiveArgs' with regcount of 96 (target: sm_53)
nvlink error : entry function '_Z37ncclReduceScatterTreeLLKernel_sum_u328ncclColl' with max regcount of 80 calls function '_Z24ncclAllReduceRing_sum_i8P14CollectiveArgs' with regcount of 96 (target: sm_53)
nvlink error : entry function '_Z37ncclReduceScatterRingLLKernel_sum_u328ncclColl' with max regcount of 80 calls function '_Z24ncclAllReduceRing_sum_i8P14CollectiveArgs' with regcount of 96 (target: sm_53)
nvlink error : entry function '_Z37ncclReduceScatterTreeLLKernel_sum_i328ncclColl' with max regcount of 80 calls function '_Z24ncclAllReduceRing_sum_i8P14CollectiveArgs' with regcount of 96 (target: sm_53)
nvlink error : entry function '_Z37ncclReduceScatterRingLLKernel_sum_i328ncclColl' with max regcount of 80 calls function '_Z24ncclAllReduceRing_sum_i8P14CollectiveArgs' with regcount of 96 (target: sm_53)
nvlink error : entry function '_Z36ncclReduceScatterTreeLLKernel_sum_u88ncclColl' with max regcount of 80 calls function '_Z24ncclAllReduceRing_sum_i8P14CollectiveArgs' with regcount of 96 (target: sm_53)
nvlink error : entry function '_Z36ncclReduceScatterRingLLKernel_sum_u88ncclColl' with max regcount of 80 calls function '_Z24ncclAllReduceRing_sum_i8P14CollectiveArgs' with regcount of 96 (target: sm_53)
nvlink error : entry function '_Z36ncclReduceScatterTreeLLKernel_sum_i88ncclColl' with max regcount of 80 calls function '_Z24ncclAllReduceRing_sum_i8P14CollectiveArgs' with regcount of 96 (target: sm_53)
nvlink error : entry function '_Z36ncclReduceScatterRingLLKernel_sum_i88ncclColl' with max regcount of 80 calls function '_Z24ncclAllReduceRing_sum_i8P14CollectiveArgs' with regcount of 96 (target: sm_53)
.
.
.
.
.
some 10000 lines of nvlink error
.
.
.
.
.
.
nvlink error : entry function '_Z33ncclAllReduceTreeLLKernel_sum_u328ncclColl' with max regcount of 80 calls function '_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs' with regcount of 96 (target: sm_53)
nvlink error : entry function '_Z33ncclAllReduceRingLLKernel_sum_u328ncclColl' with max regcount of 80 calls function '_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs' with regcount of 96 (target: sm_53)
nvlink error : entry function '_Z33ncclAllReduceTreeLLKernel_sum_i328ncclColl' with max regcount of 80 calls function '_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs' with regcount of 96 (target: sm_53)
nvlink error : entry function '_Z33ncclAllReduceRingLLKernel_sum_i328ncclColl' with max regcount of 80 calls function '_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs' with regcount of 96 (target: sm_53)
nvlink error : entry function '_Z32ncclAllReduceTreeLLKernel_sum_u88ncclColl' with max regcount of 80 calls function '_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs' with regcount of 96 (target: sm_53)
nvlink error : entry function '_Z32ncclAllReduceRingLLKernel_sum_u88ncclColl' with max regcount of 80 calls function '_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs' with regcount of 96 (target: sm_53)
nvlink error : entry function '_Z32ncclAllReduceTreeLLKernel_sum_i88ncclColl' with max regcount of 80 calls function '_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs' with regcount of 96 (target: sm_53)
nvlink error : entry function '_Z32ncclAllReduceRingLLKernel_sum_i88ncclColl' with max regcount of 80 calls function '_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs' with regcount of 96 (target: sm_53)
Makefile:68: recipe for target '/codehub/external/pytorch/build/nccl/obj/collectives/device/devlink.o' failed
make[2]: *** [/codehub/external/pytorch/build/nccl/obj/collectives/device/devlink.o] Error 255
make[2]: Leaving directory '/codehub/external/pytorch/third_party/nccl/nccl/src/collectives/device'
Makefile:49: recipe for target '/codehub/external/pytorch/build/nccl/obj/collectives/device/colldevice.a' failed
make[1]: *** [/codehub/external/pytorch/build/nccl/obj/collectives/device/colldevice.a] Error 2
make[1]: Leaving directory '/codehub/external/pytorch/third_party/nccl/nccl/src'
Makefile:25: recipe for target 'src.build' failed
make: *** [src.build] Error 2
[387/3538] Generating src/x86_64-fma/blas/shdotxf.py.o
[388/3538] Building CXX object third_party/fbgemm/CMakeFiles/fbgemm_generic.dir/src/EmbeddingSpMDM.cc.o
[389/3538] Building CXX object third_party/fbgemm/CMakeFiles/fbgemm_generic.dir/src/EmbeddingSpMDMNBit.cc.o
[390/3538] Building CXX object third_party/fbgemm/CMakeFiles/fbgemm_avx2.dir/src/FbgemmI8Depthwise3DAvx2.cc.o
[391/3538] Building CXX object third_party/fbgemm/CMakeFiles/fbgemm_avx2.dir/src/FbgemmI8DepthwisePerChannelQuantAvx2.cc.o
[392/3538] Building CXX object third_party/fbgemm/CMakeFiles/fbgemm_avx2.dir/src/FbgemmI8Depthwise3x3Avx2.cc.o
[393/3538] Building CXX object third_party/fbgemm/CMakeFiles/fbgemm_avx2.dir/src/FbgemmI8DepthwiseAvx2.cc.o
ninja: build stopped: subcommand failed.
Building wheel torch-1.5.0a0+9857d9b
-- Building version 1.5.0a0+9857d9b
cmake -GNinja -DBUILD_PYTHON=True -DBUILD_TEST=True -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/codehub/external/pytorch/torch -DCMAKE_PREFIX_PATH=/root/anaconda3 -DNUMPY_INCLUDE_DIR=/root/anaconda3/lib/python3.7/site-packages/numpy/core/include -DPYTHON_EXECUTABLE=/root/anaconda3/bin/python -DPYTHON_INCLUDE_DIR=/root/anaconda3/include/python3.7m -DPYTHON_LIBRARY=/root/anaconda3/lib/libpython3.7m.so.1.0 -DTORCH_BUILD_VERSION=1.5.0a0+9857d9b -DUSE_NUMPY=True /codehub/external/pytorch
cmake --build . --target install --config Release -- -j 8
Traceback (most recent call last):
File "setup.py", line 737, in <module>
build_deps()
File "setup.py", line 316, in build_deps
cmake=cmake)
File "/codehub/external/pytorch/tools/build_pytorch_libs.py", line 62, in build_caffe2
cmake.build(my_env)
File "/codehub/external/pytorch/tools/setup_helpers/cmake.py", line 341, in build
self.run(build_args, my_env)
File "/codehub/external/pytorch/tools/setup_helpers/cmake.py", line 141, in run
check_call(command, cwd=self.build_dir, env=env)
File "/root/anaconda3/lib/python3.7/subprocess.py", line 347, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--target', 'install', '--config', 'Release', '--', '-j', '8']' returned non-zero exit status 1.