Hi,
Following the appropriate topic, Im trying to serve my converted model with Nvidia Triton, but getting error message.
Deployment Response Error, Status Code: 400, Reason: load failed for model 'SwinT-pt': version 1 is at UNAVAILABLE state: Internal: failed to load model 'SwinT-pt': [Error thrown at core/runtime/TRTEngine.cpp:62] Expected (cuda_engine.get() != nullptr) to be true but got false
Unable to deserialize the TensorRT engine
;
Model converted to torch_tensorrt with this code
exp_program = torch.export.export(model, tuple(inputs))
trt_gm = torch_tensorrt.dynamo.compile(exp_program, inputs, enabled_precisions=[torch.float], output_format="torchscript", version_compatible = True)
Model configuration file config.pbtxt is
name: "SwinT-pt"
platform: "pytorch_libtorch"
max_batch_size: 2
dynamic_batching {
max_queue_delay_microseconds: 2000
}
input [
{
name: "input__0"
data_type: TYPE_FP32
dims: [2, 256, 256]
}
]
output [
{
name: "output__0"
data_type: TYPE_FP32
dims: [4]
}
]
instance_group [
{
count: 1
kind: KIND_GPU
gpus: [0]
}
]
model_warmup [
{
name : "warmup_sample"
batch_size: 2
count: 5
inputs {
key: "input__0"
value: {
data_type: TYPE_FP32
dims: [2, 256, 256]
random_data: true
}
}
}
]
Also model can be loaded through python api inside the triton docker container
Have you any ideas what can caused that? Thank you