Is there a mechanism for fusing element-wise kernels in C++/libtorch?

Assume I do both model definition/training and inference in C++, so we aren’t in the usual use-case where a model was already jitted & trained in Python/PyTorch. I’m finding myself currently fusing ops by hand with custom CUDA kernels when working with libtorch/C++ because I thought the equivalent of @torch.jit.script might not be possible with pure C++ unless there’s a way to define functions “symbolically” in libtorch. But I’m still wondering whether I might be wrong and automatic fusion exists in libtorch/C++?

Thank you!

1 Like