torch::jit::script::Module hang on forever on loading or inference pass

Hi everyone, I’m not sure if this is the right place to post this issue or if it’s more CUDA-related.
I have a super simple script which should load a torch::jit::script::Module and run some forward passes on a dummy tensor.
However, depending on the model I use, it gets stuck either when it tries to load the model (model1) or when it runs the first forward pass (model2.pt), and I have to manually kill the process.
I’m working on a Windows 10 laptop with a NVIDIA GeForce RTX 3080 and:

  • CUDA Toolkit 10.2.
  • LibTorch 1.10.1 for CUDA 10.2 (both Release and Debug).
  • Visual Studio 2019

The output of nvidia-smi is:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 466.81       Driver Version: 466.81       CUDA Version: 11.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ... WDDM  | 00000000:01:00.0 Off |                  N/A |
| N/A    0C    P5    22W /  N/A |    183MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

I built a VS project with this template. The code is the following:

#include <iostream>

#include <ATen/Context.h>
#include <torch/torch.h>
#include <torch/script.h>

int main() {
	const torch::Device device = torch::Device(torch::kCUDA, 0);
	torch::jit::script::Module model;
	std::string MODEL_PATH = "path/to/model1.pt";
        // std::string MODEL_PATH = "path/to/model2.pt"
	// Load Model
        std::cout << "Trying to load model..." << std::endl;
	try {  
		model = torch::jit::load(MODEL_PATH, device);
		model.eval();
		std::cout << "AI model loaded successfully." << std::endl;
	}
	catch (const c10::Error& e) {
		std::cerr << e.what() << std::endl;
	 
	}

	std::cout << "Warming up model..." << std::endl;

	auto dummy = torch::zeros({ 1, 3, 512, 512}).to(device);
	torch::Tensor output;
	std::vector<torch::jit::IValue> inputs;

	inputs.clear();
	inputs.emplace_back(dummy);


	std::cout << "\tIteration: ";

	for (int i = 0; i < 20; i++) {
		std::cout << i << ", ";
		output = model.forward(inputs).toTensor();
		torch::cuda::synchronize();
	}
}

Output for model1

Trying to load model...

Output for model2

Trying to load model...
AI model loaded successfully.
Warming up model...
        Iteration: 0,

I can’t figure out what could be the problem or what could I do to further investigate it.

Follow-up: On the RTX30 Series you MUST use at least CUDA 11.1 since it’s the first CUDA version which offers support for NVIDIA Ampere architecture (source: CUDA 11.1 Release Notes chapter 1.2).
After installing CUDA Toolkit 11.3 and LibTorch for CUDA 11.3 the model started working properly.