Dynamic vs static set of images

I have a problem where I want to create a network that classify a set of images, however the number of images are not fixed, How do I come up with a solution such that the network deals with this dynamic behavior of set of inputs.

Thank you.

You can train for a large set and pad shorter ones (that’s the typical approach when working with sequences)
Another idea is to compute N features for N images and then doing an average (which is somehow a good option)

average the features? Thank you for your help

Well, average pooling is nothing but an average and it works.
Averaging will just generate a global descriptor for your set.
Then you need just to classify that global descriptor.

Ofc this approach works for stable scenarios where each element in the set does a soft contribution.

So imagine you have a set which is {penguin, penguin, ostrich, chicken,…, }
and you have 20 elements like these. You would classify it as “birds”.

If all of a sudden you add a human as 21st element, its features will be dissolved and prob still classified as birds. Although you may want to classify that as “biped”.

In case where single elements may change radically the meaning of the set, average won’t work very well.