Issue with max-autotune mode in torch.compile

Prakhar · May 30, 2024, 9:19am

Hi,

I am getting the following warning when trying to use the max-autotune mode using torch.compile.

“”"
torch._inductor.utils: [WARNING] not enough SMs to use max_autotune_gemm mode
skipping cudagraphs due to [‘non-cuda device in graph’]

“”"

Does this mean that I am using max-autotune no cudagraph mode automatically, if so, can someone explain how much performance could I possibly be using.

Also, I am using NVIDIA A100 GPU, I don’t think this warning should have come at all.

ptrblck · May 30, 2024, 4:26pm

That’s right since your A100 should have 108 SMs while the warning uses a threshold of 68 as seen here.
Did you enable MIG and reduced the SM count?