Hello everyone. I recently loading a torchscript model in C++, when i use the model to infer, the first pass takes about 20s, while the others take only about 0.5s.
Has anyone ever done any related work or met the same problem?
is there any way to disable the optimization or choose the optimization level or after optimization we can save the model(or the computation graph)?
or is it an Inevitable warm-up process?
I’d appreciate if anybody can help me! Thanks in advance!