Trying to find compatible versions between two different environments

Han_Brian_Lee · February 1, 2022, 7:34pm

I’m trying to save a serialized tensorRT optimized model using torch_tensorrt from one environment and then load it in another environment (different GPUs. one has Quadro M1000M, and another has Tesla P100.

In both environments I don’t have full sudo control where I can install whatever I want (i.e. can’t change nvidia driver), but I am able to install different cuda toolkits locally, same with pip installs with wheels.

I have tried:
env #1 =

Tesla P100,
Nvidia driver 460,
CUDA 11.3 (checked via torch.version.cuda). nvidia-smi shows 11.2. has many cuda versions installed from 10.2 to 11.4
CuDNN 8.2.1.32
TensorRT 8.2.1.8
Torch_TensorRT 1.0.0
Pytorch 1.10.1+cu113 (conda installed)

env #2 =

Quadro M1000M
Nvidia driver 455
CUDA 11.3(checked via torch.version.cuda, backwards compatibilty mode I believe, but technically 11.3 requires 460+ nvidia driver according to the compatibility table). nvidia-smi shows 11.1. has 10.2 version available aside from 11.3 I installed.
CuDNN 8.2.1.32
TensorRT 8.2.1.8
Torch_TensorRT 1.0.0
Pytorch 1.10.1+cu113 (pip installed)

So as you can see the only difference is really the GPU and the NVIDIA driver (455 vs 460).
Is this supposed to work?
On env#1, I can torch_tensorrt compile any models
On env#2, I run into issues if I try to compile any slightly complex models (i.e. resnet34) where it says:
WARNING: [Torch-TensorRT] - Dilation not used in Max pooling converter
WARNING: [Torch-TensorRT TorchScript Conversion Context] - TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.5.1
ERROR: [Torch-TensorRT TorchScript Conversion Context] - 1: [wrapper.cpp::plainGemm::197] Error Code 1: Cublas (CUBLAS_STATUS_NOT_SUPPORTED)
ERROR: [Torch-TensorRT TorchScript Conversion Context] - 2: [builder.cpp::buildSerializedNetwork::609] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )

If I try to “torch.jit.load” any model made in env #1 (even the simplest ones like a model with 1 conv2d layer) on env #2, I get the following error msg:
~/.local/lib/python3.6/site-packages/torch/jit/_serialization.py in load(f, map_location, _extra_files)
159 cu = torch._C.CompilationUnit()
160 if isinstance(f, str) or isinstance(f, pathlib.Path):
→ 161 cpp_module = torch._C.import_ir_module(cu, str(f), map_location, _extra_files)
162 else:
163 cpp_module = torch._C.import_ir_module_from_buffer(

RuntimeError: [Error thrown at core/runtime/TRTEngine.cpp:44] Expected most_compatible_device to be true but got false
No compatible device was found for instantiating TensorRT engine

Environment

Explained above

ptrblck · February 1, 2022, 7:37pm

Based on the raised warnings and errors, I would recommend to create an issue in the Torch-TensorRT GitHub repository so that the devs could try to debug the issues.

Han_Brian_Lee · February 1, 2022, 7:39pm

Thanks, I created it here in case anyone wants to follow the issue:

github.com/NVIDIA/Torch-TensorRT

❓ [Question] Trying to find compatible versions between two different environments

opened 07:33PM - 01 Feb 22 UTC

hanbrianlee

question

## ❓ Question I'm trying to save a serialized tensorRT optimized model using …torch_tensorrt from one environment and then load it in another environment (different GPUs. one has Quadro M1000M, and another has Tesla P100. In both environments I don't have full sudo control where I can install whatever I want (i.e. can't change nvidia driver), but I am able to install different cuda toolkits locally, same with pip installs with wheels. ## What you have already tried I have tried: env #1 = 1. Quadro M1000M, 2. Nvidia driver 460, 3. CUDA 11.3 (checked via torch.version.cuda). nvidia-smi shows 11.2. has many cuda versions installed from 10.2 to 11.4 4. CuDNN 8.2.1.32 5. TensorRT 8.2.1.8 6. Torch_TensorRT 1.0.0 7. Pytorch 1.10.1+cu113 (conda installed) env #2 = 1. Quadro M1000M 2. Nvidia driver 455 3. CUDA 11.3(checked via torch.version.cuda, backwards compatibilty mode I believe, but technically 11.3 requires 460+ nvidia driver according to the compatibility table). nvidia-smi shows 11.1. has 10.2 version available aside from 11.3 I installed. 4. CuDNN 8.2.1.32 5. TensorRT 8.2.1.8 6. Torch_TensorRT 1.0.0 7. Pytorch 1.10.1+cu113 (pip installed) -------------------------------------------------------------------------- So as you can see the only difference is really the GPU and the NVIDIA driver (455 vs 460). Is this supposed to work? On env#1, I can torch_tensorrt compile any models On env#2, I run into issues if I try to compile any slightly complex models (i.e. resnet34) where it says: WARNING: [Torch-TensorRT] - Dilation not used in Max pooling converter WARNING: [Torch-TensorRT TorchScript Conversion Context] - TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.5.1 ERROR: [Torch-TensorRT TorchScript Conversion Context] - 1: [wrapper.cpp::plainGemm::197] Error Code 1: Cublas (CUBLAS_STATUS_NOT_SUPPORTED) ERROR: [Torch-TensorRT TorchScript Conversion Context] - 2: [builder.cpp::buildSerializedNetwork::609] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. ) -------------------------------------------------------------------------- If I try to "torch.jit.load" any model made in env #1 (even the simplest ones like a model with 1 conv2d layer) on env #2, I get the following error msg: ~/.local/lib/python3.6/site-packages/torch/jit/_serialization.py in load(f, map_location, _extra_files) 159 cu = torch._C.CompilationUnit() 160 if isinstance(f, str) or isinstance(f, pathlib.Path): --> 161 cpp_module = torch._C.import_ir_module(cu, str(f), map_location, _extra_files) 162 else: 163 cpp_module = torch._C.import_ir_module_from_buffer( RuntimeError: [Error thrown at core/runtime/TRTEngine.cpp:44] Expected most_compatible_device to be true but got false No compatible device was found for instantiating TensorRT engine ## Environment Explained above