Torch Compile Custom Op

If I register a custom op using the torch.library.Library api that calls a triton.jit kernel then compile a module containing this custom op with cpp_wrapper enabled, is the cubin of the triton kernel embedded in the generated CUDA extension?

How does this differ from a module with only (non-custom) aten ops that are compiled using inductor and lowered into triton kernels (through the inductor lowering pipeline) then output using the cpp_wrapper option?