I would like to initialise a multi-channel 2d convolution layer such that it simply replicates the input (identity). For a single channel image I know the identity kernel is:
[0, 0, 0
0, 1, 0
0, 0, 0]
But how can I do this for a 3 channel image?
Input is [b, 3, h, w]
and output is also [b, 3, h, w]
, so my weight size would be [3, 3, 3, 3]
.
My best guess which doesn’t work (x should equal y):
x = torch.rand(1, 3, 3, 3)
w = torch.tensor([
[0., 0., 0.,],
[0., 1., 0.,],
[0., 0., 0.,],
]).view(1, 1, 3, 3).repeat(3, 3, 1, 1)
y = nn.functional.conv2d(x, w, bias=None, stride=1, padding=1, dilation=1)
print(torch.allclose(x, y)) # prints False