I’m working on a cnn that directly processes each patch. After reading the documentation on fold and unfold, my understanding is that I can first apply convolution on an arbitrary `[b, c, h, w]`

input named `A`

, with some parameters for stride, dilation and padding. Let’s notate the output as shape `[b, c, h1, w1]`

, named `B`

.

My understanding for how fold and unfold works is as follows: If I were to unfold the input `A`

, I would get something of the shape `[b, H, L]`

. Then I can apply some transformation to `H`

dimension, and then use unfold with `B`

’s shape for the `output_shape`

parameter, and taking the same parameters like stride and dilation.

However, it doesn’t look like it works this way. If my understanding is correct, the following example should execute without a problem. But it doesn’t run. Can someone point out what I missed?

```
A = torch.randn(4, 32, 224, 224)
out_shape = nn.Conv2d(32, 32, kernel_size=3, dilation=2, stride=2, padding=0)(A).shape[-2:]
windows = torch.nn.functional.unfold(A, kernel_size=3, dilation=2, stride=2, padding=0)
A_ = torch.nn.functional.fold(windows, output_size=out_shape, kernel_size=3, dilation=2, stride=2, padding=0)
```

This would complain that fold given output size=110 would require a size of L=2809, while the windows variable has an L=12100 which is 110**2. Then I tried to fold again with output_size=224, which doesn’t complain anymore but gives me back a shape of `[4, 32, 224, 224]`

.