I’m working on a cnn that directly processes each patch. After reading the documentation on fold and unfold, my understanding is that I can first apply convolution on an arbitrary [b, c, h, w]
input named A
, with some parameters for stride, dilation and padding. Let’s notate the output as shape [b, c, h1, w1]
, named B
.
My understanding for how fold and unfold works is as follows: If I were to unfold the input A
, I would get something of the shape [b, H, L]
. Then I can apply some transformation to H
dimension, and then use unfold with B
’s shape for the output_shape
parameter, and taking the same parameters like stride and dilation.
However, it doesn’t look like it works this way. If my understanding is correct, the following example should execute without a problem. But it doesn’t run. Can someone point out what I missed?
A = torch.randn(4, 32, 224, 224)
out_shape = nn.Conv2d(32, 32, kernel_size=3, dilation=2, stride=2, padding=0)(A).shape[-2:]
windows = torch.nn.functional.unfold(A, kernel_size=3, dilation=2, stride=2, padding=0)
A_ = torch.nn.functional.fold(windows, output_size=out_shape, kernel_size=3, dilation=2, stride=2, padding=0)
This would complain that fold given output size=110 would require a size of L=2809, while the windows variable has an L=12100 which is 110**2. Then I tried to fold again with output_size=224, which doesn’t complain anymore but gives me back a shape of [4, 32, 224, 224]
.