Best practice for structuring and loading multi-label classification data

Hello guys,

I’m wondering if there’s some standardized/best practice way of both storing and loading data from a multi-label dataset. For my use case, I’m generating data from an audio dataset, in which I cut up audio-tracks into spectrograms as input features, and each input is multi labeled with 18 different classes.

After preprocessing one audio-track I get a (for example) (33 x 96 x 86) numpy array of 33 inputs and a corresponding (33 x 18) numpy array with labels.

Right now I’m storing everything like:

└── Preprocessed
    ├── Track1
    │   ├── labels.npy
    │   └── spectrograms.npy
    ├── Track2
    │   ├── labels.npy
    │   └── spectrograms.npy
    └── Track3
        ├── labels.npy
        └── spectrograms.npy

The reason for this is so that I can keep track of which track the different input comes from, in case I want to balance it somehow in a train/test split.

However, it feels like kind of a hussle to build a Dataset and corresponding __getitem__ in PyTorch using this structure, so I’m wondering if there is some smarter standardized way to structure and load multi-labeled data?