I have been searching around and I cannot find any easy answers to how to dynamically calculate the output size of a set of convolutional layers. I see the formula here and most of the terms are obvious except for the dilation term.
With stride=1
, kernel=3
I am using omniglot
- inputs: (batch, 1, 28, 28)
- outputs: (batch, 64, 1, 1)
- dialtion: 14.5???
and imagenet
- inputs: (batch, 3, 84, 84)
- outputs: (batch, 64, 5, 5)
- dilation: 40.5???
These dont seem to make sense