Multiple Object Detection from scratch

Hello all,
I am looking to create a model to do multiple object detection from scratch.

I understand the basics of how to create a (CNN) model for single object classification and localization but I wasn’t able to found a tutorial on how to model from scratch a PyTorch class to do a multiple object detection and classification.

Looking for implementation of some well known networks they seems to be using a sliding window algorithm or a grid, but I am not really sure on how to implement them.
I am also having an hard time understanding how to return a variable number of labels, depending on the number of objects in the image, from the forward pass of the network.

Could someone please point me in the right direction?

Thanks! :smile: