Easy way to loading localization dataset


I’d like to train model with class label and bounding box label at the same time, (e.g. CUB200 2011 dataset, ImageNet 12 CLSLOC dataset) and may preprocess with cropping/scaling (and they should be applied to both image and bounding box label.)

Is there any simple way to create the dataset and dataloader for this purpose?
It would be helpful for providing any comments or sample code.



It’s actually pretty easy to create a new dataset/loader by inheriting data.Dataset. I’d recommend just looking at how its done in the visions repo: https://github.com/pytorch/vision/tree/master/torchvision/datasets
And then just create a custom collate function based on the way you would like to structure your batches. A buddy of mine and I created one for VOC Detection if you want to check out this PR for an idea: https://github.com/pytorch/vision/pull/86 It’ll also be in a detection implementation I should have up by later today if you want to get a better feel for how its used…

Thanks @amdegroot !
I suddenly faced with many things to handle, so I read your reply so lately.
Anyway, thanks for your advice and code example, I’ll try it soon!