Thank you very much. I think this is what I was looking for. But I am a bit confused by the provided example:
>>> # Convolution is equivalent with Unfold + Matrix Multiplication + Fold (or view to output shape)
>>> inp = torch.randn(1, 3, 10, 12)
>>> w = torch.randn(2, 3, 4, 5)
>>> inp_unf = torch.nn.functional.unfold(inp, (4, 5))
>>> out_unf = inp_unf.transpose(1, 2).matmul(w.view(w.size(0), -1).t()).transpose(1, 2)
>>> out = torch.nn.functional.fold(out_unf, (7, 8), (1, 1))
>>> # or equivalently (and avoiding a copy),
>>> # out = out_unf.view(1, 2, 7, 8)
>>> (torch.nn.functional.conv2d(inp, w) - out).abs().max()
tensor(1.9073e-06)
In this example, I’m not sure I understand why torch.nn.functional.fold
is equivalent to using view
. In this example out_unf
is not contiguous (out_unf.is_contiguous()
is False
) because of the last transpose. My questions are:
a) In what cases are they equivalent?
b) Shouldn’t .view(...)
raise an error since out_unf
is not contiguous?: RuntimeError: input is not contiguous