Maybe you can follow this disscussion:
In my case, I am using Ubuntu 16.04 and CUDA 7.5 and I add -D_FORCE_INLINES in CXXFLAGS in file torch/lib/nccl/Makefile.
-D_FORCE_INLINES
torch/lib/nccl/Makefile
@shiningsurya @ywu36