Many thanks for the comment!
In my application case, torch.matmul
and torch.linalg.solve
are the most time-consuming part, where I got ~2.6x speed-up with M1 Pro vs. i7-11800H, and more if vs. older Intel CPUs).
However, this is no where near the speed-up from recent Nvidia GPUs (~13.5x speed-up with 130W laptop Nvidia RTX 3070 vs i7-11800H, and more if with e.g., A100).
So I was hoping for a performance boost from new release.
Is AMX already used in pytorch
? I saw you posted a question asking that and also this comment on GitHub.