Hello – first post here, so please let me know if there is anything I can provide to augment this question. I don’t think I am posting a duplicate because I have searched for an answer pretty exhaustively at this point.
I think what I am trying to do is pretty simple – I am trying to write a binary classifier for multiple time series data that are in the form of images (basically a bunch of “videos”). I am loading the dataset using ImageFolder but I want to batch every series of images together. These series are sorted into a specific folder but of course, ImageFolder simply loads everything as one “series”. How can I load the data such that I can specify which images are in a batch? I think the simplest way to do this is to have each series of arbitrary length in its own folder.
I would probably write a custom dataset.
It’s not all that hard and it will turn out a lot cleaner than trying to ImageFolder to abide to your bidding.
(To my mind, this is the philosophy behind Dataset and DataLoader: Dataset is the problem-specific part and intended to be easy to implement, DataLoader is the generic part and does all sorts of weird tricks for you and intended to be used as is.)
If you allow yourself to feed in the categories (I usually code them into the dataset because I prefer error messages over silent failures should a class folder be missing), it should just be a matter of a few
sorted(glob.glob(...)) and opening the images with
We have a new library in beta phase,
torchdata. That has some built-in functionalities to do some of the things that you are looking for (e.g. listing files in a directory, opening them, grouping them), if iterable-style dataset is acceptable for your use case. Otherwise, writing your own
Dataset as Thomas suggested is a good idea as well.