Mehdi
February 8, 2018, 2:31pm
1
Hello,
Just starting to use images to learn, I have a question. I have a 80*80 image going into the following architecture:
Convolution that outputs 16 feature maps using an 8*8 kernel Conv2d(1,16,(8,8))
Convolution that takes the 16 outputs as input and outputs 32 feature maps using an 4*4 kernel Conv2d(16,32,(4,4))
I want to take the output of this last layer and stick it into a 256 unit fully connected layer but I run into a dimensional error.
Any tips ?
Thanks a lot !
Do you know what the size of the output of the last layer (the one before the fully connected layer) is?
Mehdi
February 8, 2018, 3:32pm
3
I don’t. How can I find out ?
You can print it with print(output.size())
(assuming output
is your tensor right before you feed it to the fc layer).
Mehdi
February 8, 2018, 4:09pm
5
It says [1,32,70,70]. Do I have to multiply all of that ? Then it will be a 156800*256 matrix ? That sounds like a lot, doesn’t it ?
That sounds right (linear docs here ).
If that’s too big you can downsample using a pooling layer. Something like maxpool
1 Like
Mehdi
February 9, 2018, 10:37am
7
Thanks.
But is it actually common to have matrices that big? It sounds impressive.
ptrblck
February 9, 2018, 10:54am
8
In the common architectures pooling operations are usually used.
However, such high number are also common to see.
Have a look at the classic VGG16 architecture.
The last pooling layer returns an activation of [batch, 512, 7, 7]
which is fed into a Linear
layer with 4096
units. This connection has 512*7*7*4096 = 100,760,448
parameters (skipping the bias here).
2 Likes