JIT performance

Spent the better part of the day trying to script a couple of facial detection and recognition models for inference in hope of seeing some performance improvements and was very disappointed.

  1. Scripting is pretty restrictive, some very benign code don’t seem to compile properly and forced me to exclude from JIT compilation. In my case it was the handling of a list of tensors erroring as the JIT doesn’t know how to handle such objects.
  2. One of the simpler models actually scripted very easily but… showed only a degradation for the first couple of passes and then identical performance to python (cpython).

I was reading through this and was really hopeful but at this point (pytorch1.7.1/cuda11.2/cudnn8.05) with my relatively common pretrained models, I am not seeing anything positive. Has anyone been able to verify performance improvements? I have experience with lua/luajit which shows drastic speed enhancements.