Convolution to dense, dimension error

Mehdi · February 8, 2018, 2:31pm

Hello,

Just starting to use images to learn, I have a question. I have a 80*80 image going into the following architecture:

Convolution that outputs 16 feature maps using an 8*8 kernel Conv2d(1,16,(8,8))
Convolution that takes the 16 outputs as input and outputs 32 feature maps using an 4*4 kernel Conv2d(16,32,(4,4))
I want to take the output of this last layer and stick it into a 256 unit fully connected layer but I run into a dimensional error.
Any tips ?

Thanks a lot !

richard · February 8, 2018, 3:01pm

Do you know what the size of the output of the last layer (the one before the fully connected layer) is?

Mehdi · February 8, 2018, 3:32pm

I don’t. How can I find out ?

richard · February 8, 2018, 3:34pm

You can print it with print(output.size()) (assuming output is your tensor right before you feed it to the fc layer).

Mehdi · February 8, 2018, 4:09pm

It says [1,32,70,70]. Do I have to multiply all of that ? Then it will be a 156800*256 matrix ? That sounds like a lot, doesn’t it ?

richard · February 8, 2018, 6:52pm

That sounds right (linear docs here).

If that’s too big you can downsample using a pooling layer. Something like maxpool

Mehdi · February 9, 2018, 10:37am

Thanks.

But is it actually common to have matrices that big? It sounds impressive.

ptrblck · February 9, 2018, 10:54am

In the common architectures pooling operations are usually used.
However, such high number are also common to see.
Have a look at the classic VGG16 architecture.
The last pooling layer returns an activation of [batch, 512, 7, 7] which is fed into a Linear layer with 4096 units. This connection has 512*7*7*4096 = 100,760,448 parameters (skipping the bias here).