Is there a backend difference when running a scripted model in python vs c++?

vladium · May 19, 2020, 3:31am

Greetings. A couple of questions regarding x86 inference backend(s):

Assume that I produced a model via torch.jit.script(). Is it correct to think that essentially the same JIT runtime + BLAS implementation are used regardless of whether this model is subsequently evaluated via the python or the c++ interface? (assuming that the c++ app is compiled and linked against lib{torch, torch_cpu, c10}.so that are part of the pytorch distribution)

Put differently, other than being able to run in a python-less environment, are there benefits to doing model inference via the c++ API?

I read everything I could find about fbgemm but remain confused as to when it actually kicks in. Does it get used only if the model contains fused or quantized ops?

Thank you.