Issue with max-autotune mode in torch.compile


I am getting the following warning when trying to use the max-autotune mode using torch.compile.

torch._inductor.utils: [WARNING] not enough SMs to use max_autotune_gemm mode
skipping cudagraphs due to [‘non-cuda device in graph’]


Does this mean that I am using max-autotune no cudagraph mode automatically, if so, can someone explain how much performance could I possibly be using.

Also, I am using NVIDIA A100 GPU, I don’t think this warning should have come at all.

That’s right since your A100 should have 108 SMs while the warning uses a threshold of 68 as seen here.
Did you enable MIG and reduced the SM count?