Small depthwise Conv1d: maximum perf on CPU?

if you have small depth-wise conv1d, then IMO it’s better to just hand-roll something and tune it with inductor / TVM – or write a pass for inductor CPU. I think there’s a lot more mileage you’ll get out of that, because basically they’ll be bandwidth-bound computations.

1 Like