When converting from diffusers, to onnx and then to TRT I notice warnings like:
[W] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.>L>>
[W] onnx2trt_utils.cpp:400: One or more weights outside the range of INT32 was clamped
Is this because the torch to onnx conversion is converting float32 bit weight to int64 and then the TRT engine build tries to convert those to int32 leading to the error? Considering these values are only going to be converted again to fp16 this seems like a poor way to do it.
Going from float32 to float16 directly should result in far less clamping being needed than float32 → int64 → int32 → float16.
I’m wonder if this might be the reason the Stable Diffusion images rendered by TensorRT have an obvious lower quality than normal pytorch inference at fp16.
Even though tensorRT doesn’t support int64 wouldn’t another viable approach be to convert from int64 directly to float16 without the intermediate int32?