Do we need to permute dimension in CNN?

xian_kgx · February 28, 2019, 3:00am

Supposed I have a CNN with conv layers and fully connected layers. Since image tensors in pyTorch is C * H * W, do we need to permute the dimensions of the feature maps after the final conv layer?

I’m thinking this might not be necessary since we have fully connected layers and the neurons in those layers will learn to pick the correct tensor values from the convolution feature maps during training.

But I could be wrong. Please advice.

MariosOreo · February 28, 2019, 3:43am

Hi,

What does permute the dimensions means?
In my shallow view, between conv layer and fc layer, there should be a flatten operation in tensorflow, and i think it is similar in pytorch.

xian_kgx · February 28, 2019, 5:30am

Yup, there should be a flatten operation. But do we need to first change the dimensions of (batch,c,h,w) to (batch,h,w,c) before flattening?

MariosOreo · February 28, 2019, 5:47am

of course not, data_format is default in NCHW in pytorch.

xian_kgx · February 28, 2019, 5:56am

What if I have something like this:

conv1 -> flatten -> dense -> dense -> reshape -> conv2

Where conv1’s output shape is (batch, c, h, w) and conv2’s input shape is (batch, h, w, c). Do you think I should permute dimension somewhere in between, or do you think the fully connected layers will do the mapping?

MariosOreo · February 28, 2019, 6:11am

The conv2’s input shape is (batch, h, w, c) does not make sense to me.
the dim of input and output of conv should be (batch, c, h, w) and if it is (batch, h, w, c) there will occur an error.

class model(nn.Module):
    def __init__(self):
        super(model, self).__init__()
        self.conv1 = nn.Conv2d(3, 5, kernel_size=3, padding=1)
        self.fc1 = nn.Linear(5, 10)
        self.fc2 = nn.Linear(10, 5)
        self.conv2 = nn.Conv2d(5, 3, kernel_size=3, padding=1)

    def forward(self, input):
        output = self.conv1(input)
        output = self.fc1(output)
        output = self.fc2(output)
        output = self.conv2(output)
        return output


a = torch.randn(size=(1, 3, 5, 5))
model = model()
output = model(a)
print(output.shape)

xian_kgx · February 28, 2019, 6:44am

Have a look at the YOLO architecture in the picture above.
Notice that there are two dense layers between the front conv layers (output shape: 7x7x1024) and the final output which is in 7x7x30.

I’m trying to implement it in Pytorch and so the output of the layer before the two dense layers is 1024x7x7 (CxHxW).

Then I first reshape/flatten before passing to the two dense layers.

After the second dense layer, I again reshape it to 7x7x30.

Notice I went from 1024x7x7 (CxHxW) to 7x7x30 (HxWxC) (my labels is build using this format). My question is whether this is acceptable? I believe the dense layers between should handle this mapping from CxHxW to HxWxC. Or should I permute the dimentions somewhere in between so that both are CxHxW (or HxWxC).

MariosOreo · February 28, 2019, 7:11am

I’m sorry that I have not met this situation.
But if your image and labels are loaded by data loader, the format both default in NCHW, and we do not need to pay attention to the data format and do addition operations.

xian_kgx · February 28, 2019, 7:19am

Thanks for your quick reply. But the output of the network is not an image and supposingly a combination of multi-class classification and regression of object detection probabilities and bounding boxes values.