For an input image x and a convolution layer w, output y = w(x)

Let’s say we expand x as a giant column vector X and w as a giant matrix W so that Y=WX.

Now we want to compute W^T(WX). Should we use F.conv_transpose2d or F.conv2d with tranposed weight matrix? These two operations give different values since the second assertion fails.

```
import torch
from torch import nn
import torch.nn.functional as F
i = torch.randn(16, 3, 32, 32)
w = torch.randn(3, 32, 5, 5)
o1 = F.conv2d(i, w.transpose(1, 0), padding=2)
o2 = F.conv_transpose2d(i, w, padding=2)
assert o1.shape == o2.shape, "shape doesn't match!"
assert torch.all((o1 - o2) < 1e-3), "value doesn't equal!"
```

I found when the conv is 1*1 and padding=0 the second assertion passes. But using larger kernel and setting padding=(kernel_size-1)/2 to keep shape will cause failure. How to solve this problem?