Torch.export with custom_op

I was writting CUDA kernels using cute DSL. My hope defining those kernels as pytorch custom ops and levarage torch.export to capture them (as cuda graphs maybe?)

Is it really possible to use torch.export with custom ops?

As per TorchExport Tutorial

The graph produced by torch.export returns a graph containing only ATen operators, which are the basic unit of computation in PyTorch.

All tutorials about custom ops are meant for torch.compile rather than torch.export.

So in summary the question is:
Is it possible to export cute DSL kernels (or triton or any other python cuda kernel programming format) in torch export?
If not, is it possible with C++ written kernels? Cos the C++ CUDA tutorial only covers torch.compile