Dynamic Structure of CNN

The easiest way for you to get something running is just by resizing all the images. Note that you don’t necessarily need to warp them, you can simply pad the smaller images with zeros.

As for the kernel sizes, it is much easier and probably more efficient to use only small kernel sizes (say 3x3) and a deeper CNN. In fact, you can simulate a receptive field of any arbitrary size with only 3x3 kernels by going deeper in your architecture. For instance, here is an illustration on how to get a 5x5 receptive field by stacking two 3x3 convolutions:

example

You save parameters by doing this: 2 x 3 x 3 = 18 for two stacked 3x3 convs versus 1 x 5 x 5 = 25 for one 5x5 conv.

Maybe I didn’t understand why you need this dynamic architecture. But if the only reason is the image sizes, I’d recommend you to try out this before, it should work just fine.

1 Like