Why pointwise nn.Conv1d is not using matmul kernels?

Example in CPU implementation of Conv1d seems to work non-deterministically · Issue #116369 · pytorch/pytorch · GitHub. For both cases there, conv2d kernels are used

What would be used for the torch.matmul with transposed inputs?

I think in theory both should use the same gemm path accepting and producing transposed values (ideally, it should depend on “actual” memory contiguity format?)

Hey!

My expectation is that historically, the conv2d kernel was the most optimized of them all and so it was the simplest and overall fastest fallback.
But if there are easy to identify cases where matmul is faster, then we can definitely add a branch there to call matmul instead!

1 Like

If gemm is not faster for these stride patterns, then matmul should use this conv path :slight_smile:

opened [discussion] Route pointwise Conv1d/Conv2d to matmul? · Issue #116506 · pytorch/pytorch · GitHub for further discussion

1 Like