Just wondering why INT8 quantized model requires much longer time than fp32 model when loading via torch.jit.load(). I think it’s very weird because the saved file size (using torch.jit.save()) of INT8 quantized model is 4x smaller than that of fp32 model.
Is there anyone who has the same issue? or is there any solution to reduce the loading time of quantized models like fp32 models?