What does adaptive average pooling do and when to use it?

I’ve seen the code for the CNN models in torchvision and I’ve noticed that nn.AdaptiveAvgPool2d is used after the convolutional layers before passing the flattened inputs into the fully-connected layers. So why is it used and is the output size of nn.AdaptiveAvgPool2d always set to whatever the output size of the input is after passing it through the convolutional layers?

Actually, nn.Linear need a certain in_features, which is CxHxW. Now you can see H and W depend on the input resolution.
Following the document, AdaptivaAvgPool2d

Applies a 2D adaptive average pooling over an input signal composed of several input planes.

The output is of size H x W, for any input size. The number of output features is equal to the number of input planes.

It is used to fix in_features for any input resolution.