I created a CNN model that takes as input a 3x512x512 image, and reduces it, through convolutions and maxpoolings, to a 1024x32x32 cube. In this cube I am using a 1x1 kernel conv layer to make a binary classification and a (x, y) regression for each of the 32x32 cells.
So I have a Conv2d later like this:
predConv = nn.Conv2d(1024, 3, kernel_size=1, padding=0)
My question is: there is any difference between using a single conv layer with 3 kernels (the first for classification, and the last two for regression), and two conv layers, one with 1 kernel (for classification) and another with 2 kernels (for regression), like this:
predClass = nn.Conv2d(1024, 1, kernel_size=1, padding=0)
predXYCoord = nn.Conv2D(1024, 2, kernel_size=2, padding=0)
I think that the answer is that there is no difference, but my classification is getting a good accuracy (0.85) and the regression is outputting nonsense values. I am using BCEWithlogitsloss and MSELoss as loss functions.
Thank you very much, and sorry for the silly question.
Both approaches should yield the same result, if you are using the same setup besides the
kernel_size=2, while the others use a kernel size of 1.
Here is a small example:
# Single layer
input = torch.randn(1, 10, 32, 32, requires_grad=True)
conv1 = nn.Conv2d(10, 3, 3, 1, 1, bias=False)
out1 = conv1(input)
input_grad_reference = input.grad.clone()
conv_grad_reference = conv1.weight.grad.clone()
# Two layers
conv2 = nn.Conv2d(10, 2, 3, 1, 1, bias=False)
conv3 = nn.Conv2d(10, 1, 3, 1, 1, bias=False)
out2 = conv2(input)
out3 = conv3(input)
print((input.grad == input_grad_reference).all())
print((conv2.weight.grad == conv_grad_reference[:2]).all())
print((conv3.weight.grad == conv_grad_reference[2:]).all())
Thank you very much for your answer. I was 99% sure that they yield the same result, I just could not find the error on my code. But I think have just found it.
I was permuting the output channels after separating them. Like this:
classification = out[:, 0] # (N, 32, 32)
regression = out[:, 1:] # (N, 2, 32, 32)
regression = regression.permute(0, 2, 3, 1).contiguous() # (N, 32, 32, 2)
Now I am permuting before separating the classifications regression outputs, and the regression results are something meaningful.
By the way, the kernel_size=2 on my question was a typo, it should be 1 on both layers. Is there any reason why you used kernel_size=3 in your examples?
Thank you again!
No, it’s just my default setup for debugging without any particular reason.
I’ve probably seen VGG/AlexNet etc. too often and just write the default conv layer.