No kernel image is available for execution on the device pytorch quadro "k4200"

ptrblck · January 27, 2023, 6:37am

The error is still the same:

c++: internal compiler error: Killed (program cc1plus)

Check if you are running out of memory and if so use export MAX_JOBS=0 before building from source.

elisha_tadepalli · January 27, 2023, 10:22am

Hi,
As you mentioned max_jobs worked and completed building pytorch. But when I am importing the torch I am getting error,

NameError: name ‘sympy’ is not defined

I was trying to install the sympy but it says it require python3.8, so I have created another conda env with python3.8 and build the torch again but this time it just took 5 seconds, then I imported the torch again but this time getting error

import torch
Segmentation fault (core dumped)

from the Segmentation Fault when importing PyTorch - #4 by ptrblck link I have tried this,

(fr1) root@ca1573a3247c:/# gdb python3 r -c “imort torch” bt
Excess command line arguments ignored. (r …)
GNU gdb (Ubuntu 10.2-0ubuntu1~18.04~2) 10.2
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type “show copying” and “show warranty” for details.
This GDB was configured as “x86_64-linux-gnu”.
Type “show configuration” for configuration details.
For bug reporting instructions, please see:
https://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.

For help, type “help”.
Type “apropos word” to search for commands related to “word”…
Reading symbols from python3…
/imort

ptrblck · January 27, 2023, 10:24am

This sounds a bit strange, but I guess the build command has just moved the locally built files to the current env?

You have a typo in the import statement.

elisha_tadepalli · January 27, 2023, 10:25am

(fr1) root@ca1573a3247c:/# gdb python3 r -c “import torch” bt
Excess command line arguments ignored. (r …)
GNU gdb (Ubuntu 10.2-0ubuntu1~18.04~2) 10.2
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type “show copying” and “show warranty” for details.
This GDB was configured as “x86_64-linux-gnu”.
Type “show configuration” for configuration details.
For bug reporting instructions, please see:
https://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.

For help, type “help”.
Type “apropos word” to search for commands related to “word”…
Reading symbols from python3…
/import torch: No such file or directory.
(gdb)

elisha_tadepalli · January 27, 2023, 12:11pm

Hi,
now there is no error but pytorch using cpu only not gpu.
I got this at the end of building process:

Installed /root/miniconda3/envs/fr/lib/python3.8/site-packages/mpmath-1.2.1-py3.8.egg
Searching for typing-extensions==4.4.0
Best match: typing-extensions 4.4.0
Adding typing-extensions 4.4.0 to easy-install.pth file

Using /root/miniconda3/envs/fr/lib/python3.8/site-packages
Finished processing dependencies for torch==2.0.0a0+git219e953

and when I am running this code:
import torch
if torch.cuda.is_available():
dev = “cuda:0”
else:
dev = “cpu”
print(dev)
a = torch.zeros(4,3)
a = a.to(dev)
print(a)

output:
cpu
tensor([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]])

ptrblck · January 27, 2023, 5:23pm

Did the install log show a properly detected CUDA toolkit as well as the desired compute capabilities?

elisha_tadepalli · January 28, 2023, 5:37am

– Building with NumPy bindings
– Not using cuDNN
– Not using CUDA
– Using MKLDNN
– Not using Compute Library for the Arm architecture with MKLDNN
– Not using CBLAS in MKLDNN
– Not using NCCL
– Building with distributed package:
– USE_TENSORPIPE=True
– USE_GLOO=True
– USE_MPI=False
– Building Executorch
– Using ITT
– Not Building nvfuser

In the document it says, python setup.py develop is for cpu build. Do I need to do anything more for gpu build?

ptrblck · January 28, 2023, 7:50am

Yes, you need to install a CUDA toolkit locally and would have to make sure the CUDA compiler (nvcc) is able to build applications for your device.

elisha_tadepalli · January 29, 2023, 2:58am

Hi @ptrblck

I was trying to install cuda toolkit 10.2 in my docker container using following commands,

wget https://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda_10.2.89_440.33.01_linux.run
sudo sh cuda_10.2.89_440.33.01_linux.run

but getting this error while installing it.
(base) root@9e512f49e89c:/# cat /var/log/cuda-installer.log
[INFO]: Driver not installed.
[INFO]: Checking compiler version…
[INFO]: gcc location: /usr/bin/gcc

[INFO]: gcc version: gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)

[INFO]: Initializing menu
[INFO]: Setup complete
[INFO]: Components to install:
[INFO]: Driver
[INFO]: 440.33.01
[INFO]: Executing NVIDIA-Linux-x86_64-440.33.01.run --ui=none --no-questions --accept-license --disable-nouveau --no-cc-version-check --install-libglvnd 2>&1
[INFO]: Finished with code: 256
[ERROR]: Install of driver component failed.
[ERROR]: Install of 440.33.01 failed, quitting

Even though I have taken the nvidia/cuda:10.2-cudnn8-runtime-ubuntu18.04 docker image, when I run nvidia-smi command in container it showing 11.4 as cuda version. so I have tried with nvidia/cuda:11.4.0-runtime-ubuntu20.04 docker image and installed cuda toolkit 11.6 and got the below error while building pytorch,

make[2]: Entering directory ‘/pytorch/third_party/nccl/nccl/src/collectives/device’
Generating rules > /pytorch/build/nccl/obj/collectives/device/Makefile.rules
Copying sendrecv.cu > /pytorch/build/nccl/obj/collectives/device/sendrecv_sum_i8.cu
nvcc fatal : Unsupported gpu architecture ‘compute_30’
make[2]: *** [Makefile:53: /pytorch/build/nccl/obj/collectives/device/sendrecv.dep] Error 1
make[2]: Leaving directory ‘/pytorch/third_party/nccl/nccl/src/collectives/device’
make[1]: *** [Makefile:51: /pytorch/build/nccl/obj/collectives/device/colldevice.a] Error 2
make[1]: *** Waiting for unfinished jobs…
make[1]: Leaving directory ‘/pytorch/third_party/nccl/nccl/src’
make: *** [Makefile:25: src.build] Error 2
[63/7012] Building CXX object third_party/protobuf/cmake/CMakeFiles/libprotoc.dir/__/src/google/protobuf/compiler/cpp/cpp_message.cc.o
ninja: build stopped: subcommand failed.

ptrblck · January 29, 2023, 4:42am

If you have trouble installing the CUDA toolkit inside a container, use the CUDA development containers with an already installed CUDA toolkit. You can check the compiler version via nvcc --version.

elisha_tadepalli · January 30, 2023, 12:02pm

Hello Mr @ptrblck,
I have changed my graphic card to M4000 which has 5.1 as compute capability and issue was resolved.
thank you for assisting me.