Greetings. A couple of questions regarding x86 inference backend(s):
- Assume that I produced a model via torch.jit.script(). Is it correct to think that essentially the same JIT runtime + BLAS implementation are used regardless of whether this model is subsequently evaluated via the python or the c++ interface? (assuming that the c++ app is compiled and linked against lib{torch, torch_cpu, c10}.so that are part of the pytorch distribution)
Put differently, other than being able to run in a python-less environment, are there benefits to doing model inference via the c++ API?
- I read everything I could find about fbgemm but remain confused as to when it actually kicks in. Does it get used only if the model contains fused or quantized ops?
Thank you.