Explanation to MaxPool2d

tomera · February 5, 2020, 4:26pm

Hi, I need helping understating what MaxPool2d function does.
I read and the docs and all the example but still Im not sure about it.
For the docs:

“Applies a 2D max pooling over an input signal composed of several input planes.”

What is “max pooling” and why do we need it?

Thanks.

martinr · February 5, 2020, 6:16pm

Well, you can probably get a getter answer from any machine learning book, but in essence it scales down the matrix. You will still have the same data, but scaled down version. How much it scales is determined by stride parameter.

Kernel size (a small window to look) determines the area to “pool” over and stride determines the step. Imagine, it starts applying kernel from upper left corner and moves the kernel by stride. So kernel 1x1 and stride 1 does nothing, keeps the input. Kernel 2x2, stride 2 will shrink the data by 2. Shrinking effect comes from the stride parameter (a step to take). Kernel 1x1, stride 2 will also shrink the data by 2, but will just keep every second pixel while 2x2 kernel will keep the max pixel from the 2x2 area.

You can also achieve the shrinking effect by using stride on conv layer directly. Difference is that conv layer also learns while pooling layer does not. There is no learning happening on the pooling layer.

tomera · February 5, 2020, 7:10pm

Thanks for the detailed answer.

In the course in ML I took they didn’t mentioned pool size or shrinking techniques. They did mention mini batches and other shirking methods for linear and logistic regression. Is this the equivalent for NN?
If we don’t define pool what will happen? It’s only for performances right? What is the motivation to shrink the data?

martinr · September 8, 2021, 8:06pm

This answer perhaps comes too late for you, but the major use case of pooling is that when you shrink the data then a convolutional layer will learn higher level features (i.e. pooling aggregates information). E.g., your (few) first high-dimensional layer(s) will learn small edges, curves etc; your pooled layer will learn lines and shapes; and finally further pooled layer recognizes whole objects. So it is not only for performance, but to actually learn