NvFuser with torch.compile

trusira · September 7, 2023, 12:10am

Hi,
What is the status of Dynamo support for NvFuser? I could not find many references/tutorials on the topic and the ones I found seem deprecated and/or use TorchScript as the graph format.
Is it possible to use NvFuser with torch.compile?

Thanks!

ptrblck · September 7, 2023, 2:04am

nvFuser is deprecated and the default backend for TorchScript (which is also in maintenance mode) was switched back to NNC in this PR.

trusira · September 7, 2023, 4:52pm

Hi @ptrblck, thanks for the clarification. I see NvFuser listed as a prospective codegen backend for Pytorch 2.0 here (PyTorch 2.0 | PyTorch) and I believe that support is not available yet.
Can you elaborate on what either nvFuser or NNC can offer in this context given that we already have Triton generating code for GPUs?

ptrblck · September 7, 2023, 6:28pm

This is correct and the nvFuser support in TorchDynamo was removed here.

As mentioned before nvFuser is also deprecated in TorchScript and NNC is used as the default backend. Note that TorchScript is in “maintenance mode” and I believe won’t get any major updates or fixes anymore.

nvFuser development continues out-of-core in a standalone repository and we are exploring different directions.

andreigh · September 16, 2023, 6:13pm

Is there any torch dynamo backend that generates CUDA code (.cu) and compiles it using nvcc (similar to how inductor uses g++ to compile fused c++) ? AFAIK, triton generates ptx code directly

trusira · September 19, 2023, 10:10pm

AFAIK, the backend you’re describing doesn’t exist. Can you explain the motivation for having this capability, given we can already generate PTX?

andreigh · September 20, 2023, 2:00am

Was just curious if this ever was a part of the project. I guess one advantage would have been to use the nvcc compiler optimizations.

yiakwy-xpu-ml-framew · July 11, 2024, 4:11am

Hi, Ptrblck I think this is not accurate. The PR has been reverted. Also I frequently see the warning in Megatron-LM that the nvfuser is no longer supported in torch script. It does not tell me whether NNC is used when compilation optimization is applied with dynomo.

ptrblck · July 11, 2024, 4:15am

The PR was initially reverted but relanded in c913f385 as also shown in the same PR further down.