What is the dilation in the convolutional layer output size formula?

I have been searching around and I cannot find any easy answers to how to dynamically calculate the output size of a set of convolutional layers. I see the formula here and most of the terms are obvious except for the dilation term.

With stride=1, kernel=3

I am using omniglot

  • inputs: (batch, 1, 28, 28)
  • outputs: (batch, 64, 1, 1)
  • dialtion: 14.5???

and imagenet

  • inputs: (batch, 3, 84, 84)
  • outputs: (batch, 64, 5, 5)
  • dilation: 40.5???

These dont seem to make sense


This is usually dilation=1 for most models. Where did you get these numbers from?

Also this blogpost has a nice visualization of what this parameter is doing.

I wasn’t used to working with convolutions. I forgot that the max pooling layer was what was causing most of the sizing issues and not the conv2d itself.