I have understanding of the models from the PyTorch 2.0 paper using torch.compile
, which use the default Inductor backend. The PyTorch benchmark/CI has scripts to test them out. But they all seem to make use of torch.compile
for a single GPU. Any benchmark/model script to get started with using torch.compile
for Multi-GPU? And how to get started along that direction?
I know vLLM v2 makes use of torch.compile
, but it seems to be somewhat a different (custom) backend than Torch Inductor. (It makes use of portions of Torch Inductor – a wrapper around the Torch Inductor element for its backend).
I wanted to explore few Multi-GPU models deployed using Torch Inductor of torch.compile
.