Why is torch.jit.script slower?

googlebot · May 4, 2021, 4:20pm

In a nutshell,

“compilation” analyzes whole functions, with knowledge about variable types - some optimizations are done at this level (e.g. dead code elimination)
python bytecode interpreter is not used to execute generated code - more specialized executor for statically typed code supposedly works faster
fusion optimizations further compile specialized cuda kernels, so e.g. a.mul(b).add(c) is computed in one go
some patterns have specialized optimizations, e.g. conv+batchnorm