Smaller filters = more memory!?

I am training a fully convolutional network on GPU. Using the same network, trained on the same data, with the same training loop, changing all of the filter sizes from 5 to 3 results in an OOM error (CUDA out of memory. Tried to allocate 102.75 MiB (GPU 0; 7.91 GiB total capacity; 4.79 GiB already allocated; 66.81 MiB free; 14.80 MiB cached)).

For some strange reason changing the filter sizes to 7 does not incur in the same issue, while it occurs again if I switch to kernel sizes of (5,5,1). The filters do not affect the size of the layers as I’m using padding.

I also noticed that when testing on random data all networks work just as fine, but they do stop working on my real data. Could this depend on having different input sizes? This still would not explain why a 5x5x5 filter would work, while a 3x3x3 would not.

That is because when you are using smaller values for filters, smaller kernels are created i.e, the complexity to compute a single image increases because the kernel being small has to move through more number of pixels. The entire complexity of computing an dataset is multiplied based on the factor i.e, for a single image.

In simple words, small filters increases the accuracy but complexity increases at the same time thereby increase in memory layer after layer.

Large filters decreases accuracy but complexity and memory usage is reduced.

Therefore, smaller filters = More memory!

Does the complexity means the computational cost? If so, I think the computational cost for lager kernels (i.e. 5x5 7x7) is more than the smaller in this case (padding to the same).
@RiccardoDF could you post some snippets, let us reproduce the case you encountered.

I found a workaround, all it takes is to run the code in benchmark mode.