FBGEMM & Python Performance

So I’m in the process of deploying a torch.jit traced model into an x86 environment. I’d ideally like to do this in Python, but I see the tutorials are in C++. Is it possible to unlock the full performance of the FBGEMM backend from Python, or should I really be using C++?

A 20-30% performance hit is acceptable, but if C++ will be multiple times faster I’ll go with that.