A lot of my time working on any research project is spent figuring building
Dataset subclasses to wrap publicly available datasets, also most of the “official implementation of *” repositories have different wrappers for the same datasets that basically do the same thing, but each time you have to figure out exactly what transformations are applied, how the data is sampled etc., which is repetitive and probably the least appealing part of work.
PyTorch has already a generally standardized way of implementing dataset classes and transformers, so creating a single repository for people to contribute wrappers of publicly available datasets doesn’t seem like a stretch. Even if a particular project needs data packaged differently it’s still way better to have a starting point.
I don’t think I’m the first one to come up with the idea, so my question is why Dataset Wrapper Zoo is not a thing yet.