Trying to find compatible versions between two different environments

I’m trying to save a serialized tensorRT optimized model using torch_tensorrt from one environment and then load it in another environment (different GPUs. one has Quadro M1000M, and another has Tesla P100.

In both environments I don’t have full sudo control where I can install whatever I want (i.e. can’t change nvidia driver), but I am able to install different cuda toolkits locally, same with pip installs with wheels.

I have tried:
env #1 =

  1. Tesla P100,
  2. Nvidia driver 460,
  3. CUDA 11.3 (checked via torch.version.cuda). nvidia-smi shows 11.2. has many cuda versions installed from 10.2 to 11.4
  4. CuDNN 8.2.1.32
  5. TensorRT 8.2.1.8
  6. Torch_TensorRT 1.0.0
  7. Pytorch 1.10.1+cu113 (conda installed)

env #2 =

  1. Quadro M1000M
  2. Nvidia driver 455
  3. CUDA 11.3(checked via torch.version.cuda, backwards compatibilty mode I believe, but technically 11.3 requires 460+ nvidia driver according to the compatibility table). nvidia-smi shows 11.1. has 10.2 version available aside from 11.3 I installed.
  4. CuDNN 8.2.1.32
  5. TensorRT 8.2.1.8
  6. Torch_TensorRT 1.0.0
  7. Pytorch 1.10.1+cu113 (pip installed)

So as you can see the only difference is really the GPU and the NVIDIA driver (455 vs 460).
Is this supposed to work?
On env#1, I can torch_tensorrt compile any models
On env#2, I run into issues if I try to compile any slightly complex models (i.e. resnet34) where it says:
WARNING: [Torch-TensorRT] - Dilation not used in Max pooling converter
WARNING: [Torch-TensorRT TorchScript Conversion Context] - TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.5.1
ERROR: [Torch-TensorRT TorchScript Conversion Context] - 1: [wrapper.cpp::plainGemm::197] Error Code 1: Cublas (CUBLAS_STATUS_NOT_SUPPORTED)
ERROR: [Torch-TensorRT TorchScript Conversion Context] - 2: [builder.cpp::buildSerializedNetwork::609] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )

If I try to “torch.jit.load” any model made in env #1 (even the simplest ones like a model with 1 conv2d layer) on env #2, I get the following error msg:
~/.local/lib/python3.6/site-packages/torch/jit/_serialization.py in load(f, map_location, _extra_files)
159 cu = torch._C.CompilationUnit()
160 if isinstance(f, str) or isinstance(f, pathlib.Path):
→ 161 cpp_module = torch._C.import_ir_module(cu, str(f), map_location, _extra_files)
162 else:
163 cpp_module = torch._C.import_ir_module_from_buffer(

RuntimeError: [Error thrown at core/runtime/TRTEngine.cpp:44] Expected most_compatible_device to be true but got false
No compatible device was found for instantiating TensorRT engine

Environment

Explained above

Based on the raised warnings and errors, I would recommend to create an issue in the Torch-TensorRT GitHub repository so that the devs could try to debug the issues.

1 Like

Thanks, I created it here in case anyone wants to follow the issue: