What are the difference between JIT and tensor_comprehensions for custom kernels?

Specifically, when writing TC-like loops in JIT-ed functions.

My issues are:

  1. I haven’t been able to get good performance out of jit.script. My use case might be a little too dynamic?
  2. When JIT-ing, I have no control on any kinds of optimizations. I can’t nudge the jitter to fuse a particular sequence of operations, for example. So I can’t make use of the jitter to eliminate OOM errors.
  3. Trying to do it with loops is generally slower than using existing maps and reductions with (memory-hungry) intermediates.
  4. tensor_comprehensions is not available on Windows, as far as I can see :confused: