How to let torch.compile only target GPU

For context: I’d like to help non-tech savvy users enjoy JIT acceleration in applications like ComfyUI and Unsloth. Recently I published triton-windows wheels to PyPI, which bundles TinyCC and ptxas, so users can ‘just pip install’ it (without manually setting up compiler toolchains, which is a pain on Windows). It also helps downstream developers properly package their applications.

This works as long as torch.compile targets GPU. However, users still need a C++ compiler if they ‘accidentally’ run torch.compile targeting CPU. If I understand correctly, this can happen because of the graph break, although I don’t yet have a minimal reproducer for this.

Is there a mechanism to let torch.compile only target GPU, and skip compiling the CPU part?

(By the way… I don’t think there is a C++ compiler toolchain that is small enough to get bundled in the wheels, while complete enough for torch.compile targeting CPU. If you know one, please tell me)