`torch.compile` (w/ Torch Inductor) benchmarks/models for Multi GPU

Abhishek_Ghosh · August 7, 2025, 5:22am

I have understanding of the models from the PyTorch 2.0 paper using torch.compile, which use the default Inductor backend. The PyTorch benchmark/CI has scripts to test them out. But they all seem to make use of torch.compile for a single GPU. Any benchmark/model script to get started with using torch.compile for Multi-GPU? And how to get started along that direction?

I know vLLM v2 makes use of torch.compile, but it seems to be somewhat a different (custom) backend than Torch Inductor. (It makes use of portions of Torch Inductor – a wrapper around the Torch Inductor element for its backend).

I wanted to explore few Multi-GPU models deployed using Torch Inductor of torch.compile.