Understanding decompositions done in primtorch and autograd

Hi community, I am developing a hardware backend ported to torchdynamo.
From engineering perspective, we care more about a minal subset of prims ir/aten ir that need to be supported. Hence how the decomposition is triggered is key to us.

  1. We all know that in some case pytorch will fallback to eager and that makes aotautograd or primtorch unavailable. In this senerio, how are we supposed to do decomposition on our own. From my point of view is a copy of inductor’s decomposition list enough for us?
  2. I searched through my log with env TORCH_COMPILE_DEBUG turned on and I want to debug decompositions happened, what is the suggested way to do this?
  3. Is it possible that if we want to do as GPUs are doing, we port our codegen with triton IR rather than aten/prim ops, in this case, is there a minimal opset that we have to implement?

cc @SherlockNoMad who might have more context