Hello everyone. I recently loading a torchscript model in C++, when i use the model to infer, the first pass takes about 20s, while the others take only about 0.5s.
Has anyone ever done any related work or met the same problem?
is there any way to disable the optimization or choose the optimization level or after optimization we can save the model(or the computation graph)?
or is it an Inevitable warm-up process?
I’d appreciate if anybody can help me! Thanks in advance!
It’s likely that you’re correct in the assessment that optimization is what’s making the first pass slow. However, 20 seconds seems pretty high and we’d like to understand what exactly is happening here. Do you mind sharing your serialized model file so we can have a look? Also, which version of PyTorch are you using?
We don’t currently have an optimization level option.
Thanks for replying me, I build the pytorch from source and the version is 1.4.0a0+93db2b8
the model is a modified transformer. I transfer the model to torchscript model by using script_model = torch.jit.script(model). (and definitely some other work for jit-compatible)
by the way, I use the torch.quantization.quantize_dynamic to quantize the model, then the first pass
costs about 12.5s.
I only use torch.jit.script(), should I mix the trace and script?
and how can I submit the serialized model file to you? for some reason, I can’t give you the
modifiled transformer, but I can provide the orginal transformer serialized model file for you which first pass costs 32.83s while the others take about 9s(yeah, it’s about 20s again). The origin and modifiled transformer are same net architecture but have different inference functions, so they have same problem in first pass.
Do you solve a “warm up” scripted model problem?
Simplest way to share this problem is:
import torchvision, torch, time
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
model = torch.jit.script(model)
model.eval()
x = [torch.randn((3,224,224))]
for i in range(3):
start = time.time()
model(x)
print(‘Time elapsed: {}’.format(time.time()-start))
Output: Time elapsed: 38.297527551651
Time elapsed: 6.655704021453857
Time elapsed: 6.651334762573242
So, can anybody help with explaining how I can load and run scripted model without this "warm up’?
Thanx.