Can I use torchvision detection models on batches of images with varying sizes?

Hi all,

I would like to run one of the detection models on a batch from a dataset.
I am fairly new to torch (though I like it so far and am very comfortable with tensorflow), I tried to use one of the faster-rcnn images to predict on a batch of images.

when calling model(dataloader) I get the following error:

RuntimeError: stack expects each tensor to be equal size, but got [3, 960, 720] at entry 0 and [3, 960, 1021] at entry 2

Is the model complaining or the dataloader? It seems like it’s the dataloader?
Can it not work with varying image sizes?

thank you for hints and help!

CNN expects uniform image sizes. If you are feeding your model with varying input images, it will result in feature vectors of varying sizes. You can address this either by resizing or padding the image.

Hi Preetham,
thanks, the detection models work with any sizes. I actually found a solution for the above, it was indeed the dataloader and using a custom_collate function worked.
Though forks (i.e. parallel execution) doesn’t with my custom collate_fn, which is a bit of a shame.