Serving Torch_TersorRT converted model with Triton

valerkabvv · April 25, 2024, 11:10am

Hi,
Following the appropriate topic, Im trying to serve my converted model with Nvidia Triton, but getting error message.

Deployment Response Error, Status Code: 400, Reason: load failed for model 'SwinT-pt': version 1 is at UNAVAILABLE state: Internal: failed to load model 'SwinT-pt': [Error thrown at core/runtime/TRTEngine.cpp:62] Expected (cuda_engine.get() != nullptr) to be true but got false
Unable to deserialize the TensorRT engine
;

Model converted to torch_tensorrt with this code

exp_program = torch.export.export(model, tuple(inputs))
trt_gm = torch_tensorrt.dynamo.compile(exp_program, inputs, enabled_precisions=[torch.float], output_format="torchscript", version_compatible = True)

Model configuration file config.pbtxt is

name: "SwinT-pt"
platform: "pytorch_libtorch"
max_batch_size: 2
dynamic_batching {
    max_queue_delay_microseconds: 2000
}
input [
    {
        name: "input__0"
        data_type: TYPE_FP32
        dims: [2, 256, 256]
    }
]
output [
    {
        name: "output__0"
        data_type: TYPE_FP32
        dims: [4]
    }
]


instance_group [
    {
        count: 1
        kind: KIND_GPU
        gpus: [0]
    }
]

model_warmup [
    {
        name : "warmup_sample"
        batch_size: 2
        count: 5
        inputs {
            key: "input__0"
            value: {
                data_type: TYPE_FP32
                dims: [2, 256, 256]
                random_data: true
             }
        }
    }
]

Also model can be loaded through python api inside the triton docker container
Have you any ideas what can caused that? Thank you