Hi,
I’m excited to contribute to PyTorch Vision. Hence, I want to build PyTorch Vision in the local system.
I followed the CONTRIBUTION.md guide. But I’m getting the following error.
/home/khushi/anaconda3/lib/python3.8/site-packages/torch/include/c10/core/TensorImpl.h(2615): error: static assertion failed with "You changed the size of TensorImpl on 64-bit arch.See Note [TensorImpl size constraints] on how to proceed."
/home/khushi/Documents/vision/torchvision/csrc/ops/cuda/deform_conv2d_kernel.cu(231): warning: variable "device_guard" was declared but never referenced
/home/khushi/Documents/vision/torchvision/csrc/ops/cuda/deform_conv2d_kernel.cu(414): warning: variable "device_guard" was declared but never referenced
/home/khushi/Documents/vision/torchvision/csrc/ops/cuda/deform_conv2d_kernel.cu(658): warning: variable "device_guard" was declared but never referenced
/home/khushi/Documents/vision/torchvision/csrc/ops/cuda/deform_conv2d_kernel.cu(763): warning: variable "guard" was declared but never referenced
/home/khushi/Documents/vision/torchvision/csrc/ops/cuda/deform_conv2d_kernel.cu(925): warning: variable "guard" was declared but never referenced
/home/khushi/Documents/vision/torchvision/csrc/ops/cuda/deform_conv2d_kernel.cu(1057): warning: variable "guard" was declared but never referenced
1 error detected in the compilation of "/home/khushi/Documents/vision/torchvision/csrc/ops/cuda/deform_conv2d_kernel.cu".
error: command '/opt/cuda/bin/nvcc' failed with exit status 1
Will anyone please help me out to resolve the error?
Are you trying to build directly from the master branch or is this already the branch with your changes?
In the latter case, check your git diff as you seem to have changed the size of TensorImpl:
/home/khushi/anaconda3/lib/python3.8/site-packages/torch/include/c10/core/TensorImpl.h(2615): error: static assertion failed with "You changed the size of TensorImpl on 64-bit arch.See Note [TensorImpl size constraints] on how to proceed."
I tried both ways; via main and via the branch I created. Got the same error I haven’t committed anything yet. Started with setting up the environment.
Answers to the questions you asked:
The git diff command doesn’t output anything.
By commit, I’m assuming you are referring to the commands used for building. They are:
I did check the official page and some of the gist. According to those references CUDA 11.4 supports GCC 11.
A few notable links, worthing to mentions are:
That’s a great suggestion, as I haven’t noticed the GCC version.
@khushi-411 At least GCC 11.1 has a known bug in CUDA 11.4 (which is already fixed in CUDA 11.5), so you would either need to downgrade GCC or update CUDA.
I did work to set it up in both ways; via CUDA 11.5 and CUDA 10. and by degrading the gcc version to gcc-10. Major Problems I am facing:
archlinux package does not have any upstream link for CUDA 11.5 (I stumbled from my side, I might be wrong).
I then turned to CUDA 10. using sudo pacman -S cuda-10.0 (Failed, since the target was not available) I found another command yay -S cuda-10.1 to install CUDA 10.0 in archlinux. This took more than 6 hrs to build.
Then I planned to degrade the gcc version. (Though, I personally, wanted to complete using CUDA). I tried many things. But currently, I am getting the following error:
/usr/bin/ld: eg: _ZSt3cin: invalid version 2 (max 0)
/usr/bin/ld: eg: error adding symbols: bad value
collect2: error: ld returned 1 exit status
SYSTEM CONFIGURATION
Manjaro Linx 21.0.0
Will you please give me some hints to resolve the error?
Thanks!
Sorry, not using arch linux. Browsed just now, archlinux has cuda 11.5.0-1 in packages.
Also I think you can install cudatoolkit, which has cuda as dependency. Either directly, or using conda.
6 hours for yay is crazy, seems like you were installing from source,