I use libtorch on RTX3090, but it occurs:
terminate called after throwing an instance of 'std::runtime_error'
what(): nvrtc: error: invalid value for --gpu-architecture (-arch)
Configuration:
pytorch v1.7.0 ->libtorch
CUDA11.0
I use libtorch on RTX3090, but it occurs:
terminate called after throwing an instance of 'std::runtime_error'
what(): nvrtc: error: invalid value for --gpu-architecture (-arch)
Configuration:
pytorch v1.7.0 ->libtorch
CUDA11.0
Could you post an executable code snippet to reproduce this issue?
I’ve used libtorch==1.7.0+CUDA11.0
and adapted the CPP Export tutorial to reproduce this issue on a 3090, but the code works fine:
#include <torch/script.h> // One-stop header.
#include <iostream>
#include <memory>
int main(int argc, const char* argv[]) {
if (argc != 2) {
std::cerr << "usage: example-app <path-to-exported-script-module>\n";
return -1;
}
torch::jit::script::Module module;
try {
// Deserialize the ScriptModule from a file using torch::jit::load().
module = torch::jit::load(argv[1]);
module.to(torch::kCUDA);
auto tensor = torch::randn({1, 3, 224, 224}).to(torch::kCUDA);
std::vector<torch::jit::IValue> inputs;
inputs.push_back(tensor);
auto output = module.forward(inputs).toTensor();
std::cout << output << std::endl;
}
catch (const c10::Error& e) {
std::cerr << "error loading the model\n";
return -1;
}
std::cout << "ok\n";
Executed via:
cmake -DCMAKE_PREFIX_PATH=/workspace/src/libtorch .. && cmake --build . --config Release && ./example-app ../traced_resnet_model.pt
The above code also works fine for me. The module in my project is similar to the code snippet, and it works fine when I extract the module out of my project. But it always occurs the above error when I run the project. I don’t know why.
I rebuild libtorch using CUDA11.1 and solve this problem.
If inputs is batch data, for example inputs contain 32 images(batch size=32) and each image is (3,224,224). How should I do? Just like method 1 below:
method 1:
auto tensor = torch::randn({batch_size, 3, 224, 224}).to(torch::kCUDA);
std::vector<torch::jit::IValue> inputs;
inputs.push_back(tensor);
auto output = module.forward(inputs).toTensor();
In this method, variable inputs
is a std::vector
and its size is always 1 no matter what batch size is.
Because we just call push_back
one time. If just one time push_back
why use std::vector<torch::jit::IValue>
which is a vector that can store multiple elements.
If I try to put the batch size dimension in above Vec like below, method 2, but it seems not work.
method 2:
std::vector<torch::jit::IValue> inputs;
for (int i=0;i<32;I++)
{
auto tensor = torch::randn({3, 224, 224}).to(torch::kCUDA);
inputs.push_back(tensor);
}
auto output = module.forward(inputs).toTensor();
You could push the tensors into a std::vector<torch::Tensor>
, use torch::cat
to create an input batch, and pass it then to the model:
// Create a vector of inputs.
std::vector<torch::Tensor> inputs;
for (int i=0; i<32; i++) {
inputs.push_back(torch::ones({1, 3, 224, 224}));
}
auto input = torch::cat(inputs);
// Execute the model and turn its output into a tensor.
at::Tensor output = module.forward({input}).toTensor();