How does pytorch use unfold in convolution?

Hi, I’m recently implementing a custom convolution and find maybe I need the nn.unfold function. While trying, I find that nn.unfold seems to output a tensor that is much larger than its input. For instance, if the kernel size is 3, the output size would be nearly 9 times the input’s size. This costs me much memory. However, nn.conv2d costs much less memory with same inpute size, while I think I find something like unfold in the C implementation. So I wonder how pytorch uses unfold, is it possible to imitate it using the functions provided in pytorch?

A similar topic with an useful discussion is available here