Unfold is slower than Convolution


I’m trying to implement a custom Convolution layer in PyTorch and need to use the im2col functionality in order to convert the convolution operation into matrix multiplication. To do so, I am utilizing the Unfold class in PyTorch. I benchmarked the performance and it seems like Unfold by itself is slower than Conv2d. I used torch.cuda.synchronize() to make sure the benchmarking was done properly. This is a problem for me since my goal is to eventually speed up the Convolutional layer – is there any way to do the Unfold operation more efficiently in PyTorch? Appreciate any tips.


This is expected, as unfold would explicitly create the matrix so that you are able to perform the matmul afterwards. Some libraries such as cudnn ship with tuned algorithms (for common conv layer setups) in order to speed up the operation.