Problem: RuntimeError: operator torchvision::nms does not exist

I’m working on Win11, WSL 2, Ubuntu 22.04.

The problem is what the title shows, and it happens when I was trying to set the environment for Mamba structure.

To reproduce,

conda create -n cmamba2 python=3.10
conda activate cmamba2
conda install -c pytorch -c conda-forge -c nvidia timm==0.6.5 pytorch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 pytorch-cuda=11.8
conda install -c conda-forge triton

Then, I checked Python, PyTorch and CUDA versions, and their versions decide which .whl file I should download.

Commands and outputs are separately and sequentially showed below.

python -V
Python 3.10.16
nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
conda list cuda
# packages in environment at /root/anaconda3/envs/cmamba2:
#
# Name                    Version                   Build  Channel
cuda-cudart               11.8.89                       0    nvidia
cuda-cupti                11.8.87                       0    nvidia
cuda-libraries            11.8.0                        0    nvidia
cuda-nvrtc                11.8.89                       0    nvidia
cuda-nvtx                 11.8.86                       0    nvidia
cuda-runtime              11.8.0                        0    nvidia
cuda-version              11.8                 h70ddcb2_3    conda-forge
cudatoolkit               11.8.0              h4ba93d1_13    conda-forge
pytorch-cuda              11.8                 h7e8668a_6    pytorch
conda list pytorch
# packages in environment at /root/anaconda3/envs/cmamba2:
#
# Name                    Version                   Build  Channel
pytorch                   2.3.0           cuda118_py310h954aa82_301    conda-forge
pytorch-cuda              11.8                 h7e8668a_6    pytorch
pytorch-mutex             1.0                        cuda    pytorch

Also, I ran a command to check whether GLIBCXX_USE_CXX11_ABI is true, maybe it means to use a specific way to compile some packages, and I cannot quite understand it.

python -c 'import torch; print(torch._C._GLIBCXX_USE_CXX11_ABI); print(torch.compiled_with_cxx11_abi())'

Its output is

True
True

So I downloaded 2 below .whl files.

causal_conv1d-1.5.0.post8+cu11torch2.3cxx11abiTRUE-cp310-cp310-linux_x86_64.whl
mamba_ssm-2.2.2+cu118torch2.3cxx11abiTRUE-cp310-cp310-linux_x86_64.whl

separately from Releases · Dao-AILab/causal-conv1d · GitHub and Releases · state-spaces/mamba · GitHub.

These 2 .whl files are stored in a file folder called whl, which is the sub file folder of the main project file folder.

Then I entered the whl file folder with cd command and ran the 2 following commands to install:

pip install causal_conv1d-1.5.0.post8+cu11torch2.3cxx11abiTRUE-cp310-cp310-linux_x86_64.whl
pip install mamba_ssm-2.2.2+cu118torch2.3cxx11abiTRUE-cp310-cp310-linux_x86_64.whl

Then

python
import trochvision

The problem appears, full output

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/root/anaconda3/envs/cmamba2/lib/python3.10/site-packages/torchvision/__init__.py", line 6, in <module>
    from torchvision import _meta_registrations, datasets, io, models, ops, transforms, utils
  File "/root/anaconda3/envs/cmamba2/lib/python3.10/site-packages/torchvision/_meta_registrations.py", line 164, in <module>
    def meta_nms(dets, scores, iou_threshold):
  File "/root/anaconda3/envs/cmamba2/lib/python3.10/site-packages/torch/library.py", line 467, in inner
    handle = entry.abstract_impl.register(func_to_register, source)
  File "/root/anaconda3/envs/cmamba2/lib/python3.10/site-packages/torch/_library/abstract_impl.py", line 30, in register
    if torch._C._dispatch_has_kernel_for_dispatch_key(self.qualname, "Meta"):
RuntimeError: operator torchvision::nms does not exist
nvidia-smi
Tue Apr 15 17:30:07 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.133.07             Driver Version: 572.83         CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4070 ...    On  |   00000000:01:00.0  On |                  N/A |
|  0%   40C    P8             18W /  285W |     788MiB /  16376MiB |      5%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A              28      G   /Xwayland                             N/A      |
+-----------------------------------------------------------------------------------------+

I’m a new comer in deep learning, and maybe I do not have enough experience on the environment configuration. I believe the edition of CUDA downloaded for the operating system should be correspondent with the edition of CUDA used to compile Pytorch, is it true? Or is it unnecessary, or I get these things wrong. I also wonder if I need to ensure that all the packages listed should have the same channel? If so, which channel is the best? I’m working on my Nvidia RTX 4090 GPU, so nvidia channel is the best choice?

I have successfully solved this problem with the help from my friend, I’m trying to figure out how such a problem was solved and what’s really wrong with my environment and configuration.