Hi,
I’m trying to implement a custom Convolution layer in PyTorch and need to use the im2col functionality in order to convert the convolution operation into matrix multiplication. To do so, I am utilizing the Unfold class in PyTorch. I benchmarked the performance and it seems like Unfold by itself is slower than Conv2d. I used torch.cuda.synchronize() to make sure the benchmarking was done properly. This is a problem for me since my goal is to eventually speed up the Convolutional layer – is there any way to do the Unfold operation more efficiently in PyTorch? Appreciate any tips.
Thanks.