Build error in PyTorch Vision

Hi,
I’m excited to contribute to PyTorch Vision. Hence, I want to build PyTorch Vision in the local system.
I followed the CONTRIBUTION.md guide. But I’m getting the following error.

/home/khushi/anaconda3/lib/python3.8/site-packages/torch/include/c10/core/TensorImpl.h(2615): error: static assertion failed with "You changed the size of TensorImpl on 64-bit arch.See Note [TensorImpl size constraints] on how to proceed."

/home/khushi/Documents/vision/torchvision/csrc/ops/cuda/deform_conv2d_kernel.cu(231): warning: variable "device_guard" was declared but never referenced

/home/khushi/Documents/vision/torchvision/csrc/ops/cuda/deform_conv2d_kernel.cu(414): warning: variable "device_guard" was declared but never referenced

/home/khushi/Documents/vision/torchvision/csrc/ops/cuda/deform_conv2d_kernel.cu(658): warning: variable "device_guard" was declared but never referenced

/home/khushi/Documents/vision/torchvision/csrc/ops/cuda/deform_conv2d_kernel.cu(763): warning: variable "guard" was declared but never referenced

/home/khushi/Documents/vision/torchvision/csrc/ops/cuda/deform_conv2d_kernel.cu(925): warning: variable "guard" was declared but never referenced

/home/khushi/Documents/vision/torchvision/csrc/ops/cuda/deform_conv2d_kernel.cu(1057): warning: variable "guard" was declared but never referenced

1 error detected in the compilation of "/home/khushi/Documents/vision/torchvision/csrc/ops/cuda/deform_conv2d_kernel.cu".
error: command '/opt/cuda/bin/nvcc' failed with exit status 1

Will anyone please help me out to resolve the error?

Dependencies:

  1. Cuda version: 11.4
  2. GCC version: 11.1.0

Thanks

Are you trying to build directly from the master branch or is this already the branch with your changes?
In the latter case, check your git diff as you seem to have changed the size of TensorImpl:

/home/khushi/anaconda3/lib/python3.8/site-packages/torch/include/c10/core/TensorImpl.h(2615): error: static assertion failed with "You changed the size of TensorImpl on 64-bit arch.See Note [TensorImpl size constraints] on how to proceed."

in the former case, which commits are you using?

1 Like

Hi @ptrblck!
Thanks for taking a look.

I tried both ways; via main and via the branch I created. Got the same error :slightly_frowning_face: I haven’t committed anything yet. Started with setting up the environment.

Answers to the questions you asked:

  1. The git diff command doesn’t output anything.
  2. By commit, I’m assuming you are referring to the commands used for building. They are:
conda activate
conda install pytorch -c pytorch-nightly
git clone https://github.com/khushi-411/vision.git
cd vision
python setup.py develop

What do you think, what’s happening?

Please check if CUDA 11.4 supports gcc 11, probably not

2 Likes

Hi @my3bikaht!
Thanks for looking into it.

I did check the official page and some of the gist. According to those references CUDA 11.4 supports GCC 11.
A few notable links, worthing to mentions are:

  1. https://gist.github.com/ax3l/9489132
  2. https://gist.github.com/ax3l/9489132#gistcomment-3866703
  3. https://docs.nvidia.com/cuda/archive/11.4.0/

Would love to get your inputs! Thanks

That’s a great suggestion, as I haven’t noticed the GCC version.

@khushi-411 At least GCC 11.1 has a known bug in CUDA 11.4 (which is already fixed in CUDA 11.5), so you would either need to downgrade GCC or update CUDA.

1 Like

Hi @ptrblck, Hi @my3bikaht!
Thanks for the suggestions!

I did work to set it up in both ways; via CUDA 11.5 and CUDA 10. and by degrading the gcc version to gcc-10.
Major Problems I am facing:

  • archlinux package does not have any upstream link for CUDA 11.5 (I stumbled from my side, I might be wrong).
  • I then turned to CUDA 10. using sudo pacman -S cuda-10.0 (Failed, since the target was not available) I found another command yay -S cuda-10.1 to install CUDA 10.0 in archlinux. This took more than 6 hrs to build.
  • Then I planned to degrade the gcc version. (Though, I personally, wanted to complete using CUDA). I tried many things. But currently, I am getting the following error:
/usr/bin/ld: eg: _ZSt3cin: invalid version 2 (max 0)
/usr/bin/ld: eg: error adding symbols: bad value
collect2: error: ld returned 1 exit status

SYSTEM CONFIGURATION

  • Manjaro Linx 21.0.0

Will you please give me some hints to resolve the error?
Thanks!

Hi @ptrblck, Hi @my3bikaht!

A gentle ping to you! Will you please look into the problem?
Thanks!

Sorry, not using arch linux. Browsed just now, archlinux has cuda 11.5.0-1 in packages.
Also I think you can install cudatoolkit, which has cuda as dependency. Either directly, or using conda.
6 hours for yay is crazy, seems like you were installing from source,

You can also download cuda specific version from here: Index of /archive/packages/c/cuda/ or here: Index of /packages/c/cuda/ and install using 'sudo pacman -U filename

Error you mentioned is a generic compiler error, we won’t be able to find the reason this way.

No problem, @my3bikaht.
I’ll try to resolve the error with other methods too. Thanks!