Speed of First pass is very slow

Hello everyone. I recently loading a torchscript model in C++, when i use the model to infer, the first pass takes about 20s, while the others take only about 0.5s.

Has anyone ever done any related work or met the same problem?

is there any way to disable the optimization or choose the optimization level or after optimization we can save the model(or the computation graph)?

or is it an Inevitable warm-up process?

I’d appreciate if anybody can help me! Thanks in advance!

Hello huoge,

It’s likely that you’re correct in the assessment that optimization is what’s making the first pass slow. However, 20 seconds seems pretty high and we’d like to understand what exactly is happening here. Do you mind sharing your serialized model file so we can have a look? Also, which version of PyTorch are you using?

We don’t currently have an optimization level option.

James

Hello James,

Thanks for replying me, I build the pytorch from source and the version is 1.4.0a0+93db2b8
the model is a modified transformer. I transfer the model to torchscript model by using script_model = torch.jit.script(model). (and definitely some other work for jit-compatible)

by the way, I use the torch.quantization.quantize_dynamic to quantize the model, then the first pass
costs about 12.5s.

this is parts of print(script_model):

I only use torch.jit.script(), should I mix the trace and script?

and how can I submit the serialized model file to you? for some reason, I can’t give you the
modifiled transformer, but I can provide the orginal transformer serialized model file for you which first pass costs 32.83s while the others take about 9s(yeah, it’s about 20s again). The origin and modifiled transformer are same net architecture but have different inference functions, so they have same problem in first pass.

Can anyone help me with this issue?

Do you solve a “warm up” scripted model problem?
Simplest way to share this problem is:

import torchvision, torch, time

model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
model = torch.jit.script(model)
model.eval()
x = [torch.randn((3,224,224))]
for i in range(3):
start = time.time()
model(x)
print(‘Time elapsed: {}’.format(time.time()-start))

Output:
Time elapsed: 38.297527551651
Time elapsed: 6.655704021453857
Time elapsed: 6.651334762573242

So, can anybody help with explaining how I can load and run scripted model without this "warm up’?
Thanx.

Maybe this post helps? Speed of Custom RNN is SUPER SLOW

3 Likes

Thanks, with torch.jit.optimized_execution(False): really helped.