CUDA configuration with libtorch C++ and training time control for CNN model

I am trying to implement a CNN model with libtorch C++ on WSL2. I have:

  • NVIDIA T1200 Laptop GPU with 4GiB on GPU
  • 12 cores CPU
  • installed Cudatoolkit version 11.7
  • installed CuDNN version
  • installed Pytorch 1.13.1+cu117

Then I used C++ API to install Torch in C++ using Torch_DIR='python -c "import torch;print(torch.utils.cmake_prefix_path)" '.

After building the executable with cmake .. I get the following result with 2 warnings :

– The C compiler identification is GNU 11.4.0
– The CXX compiler identification is GNU 11.4.0
– Detecting C compiler ABI info
– Detecting C compiler ABI info - done
– Check for working C compiler: /usr/bin/cc - skipped
– Detecting C compile features
– Detecting C compile features - done
– Detecting CXX compiler ABI info
– Detecting CXX compiler ABI info - done
– Check for working CXX compiler: /usr/bin/c++ - skipped
– Detecting CXX compile features
– Detecting CXX compile features - done
– Looking for pthread.h
– Looking for pthread.h - found
– Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
– Found Threads: TRUE
– Found CUDA: /usr/local/cuda-11.7 (found version “11.7”)
– Found CUDA: /usr/local/cuda-11.7 (found version “11.7”)
– The CUDA compiler identification is NVIDIA 11.5.119
– Detecting CUDA compiler ABI info
– Detecting CUDA compiler ABI info - done
– Check for working CUDA compiler: /usr/bin/nvcc - skipped
– Detecting CUDA compile features
– Detecting CUDA compile features - done
– Caffe2: CUDA detected: 11.7
– Caffe2: CUDA nvcc is: /usr/local/cuda-11.7/bin/nvcc
– Caffe2: CUDA toolkit directory: /usr/local/cuda-11.7
– Caffe2: Header version is: 11.7
– Found CUDNN: /usr/local/cuda-11.7/lib64/
– Found cuDNN: v8.9.1 (include: /usr/local/cuda-11.7/include, library: /usr/local/cuda-11.7/lib64/
– /usr/local/cuda-11.7/lib64/ shorthash is d833c4f3
– Autodetected CUDA architecture(s): 7.5
– Added CUDA NVCC flags for: -gencode;arch=compute_75,code=sm_75
CMake Warning at /home/amadou/anaconda3/envs/test_env/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
static library kineto_LIBRARY-NOTFOUND not found.
Call Stack (most recent call first):
/home/amadou/anaconda3/envs/test_env/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found)
CMakeLists.txt:29 (find_package)
– Found Torch: /home/amadou/anaconda3/envs/test_env/lib/python3.8/site-packages/torch/lib/
– Configuring done
CMake Warning at CMakeLists.txt:37 (add_executable):
Cannot generate a safe runtime search path for target cyconv because files
in some directories may conflict with libraries in implicit directories:

runtime library [] in /usr/lib/x86_64-linux-gnu may be hidden by files in:

Some of these libraries may not be found correctly.

– Generating done
– Build files have been written to: /home/amadou/Development/cyconv_with_pytorch/build

Then I do make and train my CNN model. But what bothers me is the execution time on GPU, which is too high. In fact, it’s not as fast as when I run the model on CPU (i.e without using CUDA).

I suspect a CUDA & Torch configuration problem on C++. I don’t have any errors in my code, but it’s just the training time that’s too high.

I can’t work out why I’m not actually using the power of GPU. Can you help me find the reason?

Thanks in advance !