Libtorch can't run on RTX3090 GPU

zhijie1996 · October 30, 2020, 9:30am

I use libtorch on RTX3090, but it occurs:

terminate called after throwing an instance of 'std::runtime_error'
  what():  nvrtc: error: invalid value for --gpu-architecture (-arch)

Configuration:
pytorch v1.7.0 ->libtorch
CUDA11.0

ptrblck · October 30, 2020, 10:29am

Could you post an executable code snippet to reproduce this issue?
I’ve used libtorch==1.7.0+CUDA11.0 and adapted the CPP Export tutorial to reproduce this issue on a 3090, but the code works fine:

#include <torch/script.h> // One-stop header.

#include <iostream>
#include <memory>

int main(int argc, const char* argv[]) {
  if (argc != 2) {
    std::cerr << "usage: example-app <path-to-exported-script-module>\n";
    return -1;
  }


  torch::jit::script::Module module;
  try {
    // Deserialize the ScriptModule from a file using torch::jit::load().
    module = torch::jit::load(argv[1]);
    module.to(torch::kCUDA);
    auto tensor = torch::randn({1, 3, 224, 224}).to(torch::kCUDA);
    std::vector<torch::jit::IValue> inputs;
    inputs.push_back(tensor);
    auto output = module.forward(inputs).toTensor();
    std::cout << output << std::endl;
  }
  catch (const c10::Error& e) {
    std::cerr << "error loading the model\n";
    return -1;
  }

  std::cout << "ok\n";

Executed via:

cmake -DCMAKE_PREFIX_PATH=/workspace/src/libtorch .. && cmake --build . --config Release && ./example-app ../traced_resnet_model.pt

zhijie1996 · October 30, 2020, 12:41pm

The above code also works fine for me. The module in my project is similar to the code snippet, and it works fine when I extract the module out of my project. But it always occurs the above error when I run the project. I don’t know why.

zhijie1996 · October 30, 2020, 2:49pm

I rebuild libtorch using CUDA11.1 and solve this problem.

mcj · November 17, 2020, 2:17am

Could you send me a builded torch ? My email 2545407140@qq.com.

MrRace · November 11, 2022, 7:17am

If inputs is batch data, for example inputs contain 32 images(batch size=32) and each image is (3,224,224). How should I do? Just like method 1 below：

method 1:

auto tensor = torch::randn({batch_size, 3, 224, 224}).to(torch::kCUDA);
std::vector<torch::jit::IValue> inputs;  
inputs.push_back(tensor);
auto output = module.forward(inputs).toTensor();

In this method, variable inputs is a std::vector and its size is always 1 no matter what batch size is.
Because we just call push_back one time. If just one time push_back why use std::vector<torch::jit::IValue> which is a vector that can store multiple elements.

If I try to put the batch size dimension in above Vec like below, method 2, but it seems not work.
method 2:

std::vector<torch::jit::IValue> inputs;  
for (int i=0;i<32;I++)
{
    auto tensor = torch::randn({3, 224, 224}).to(torch::kCUDA);
    inputs.push_back(tensor);
}
auto output = module.forward(inputs).toTensor();

ptrblck · November 11, 2022, 8:06am

You could push the tensors into a std::vector<torch::Tensor>, use torch::cat to create an input batch, and pass it then to the model:

  // Create a vector of inputs.
  std::vector<torch::Tensor> inputs;
  for (int i=0; i<32; i++) {
    inputs.push_back(torch::ones({1, 3, 224, 224}));
  }
  auto input = torch::cat(inputs);

  // Execute the model and turn its output into a tensor.
  at::Tensor output = module.forward({input}).toTensor();