Loading custom audio dataset

Superklez · March 11, 2021, 11:44am

When creating a custom dataset loader like that shown here. Is it advisable to do something like

class CustomDataset(Dataset):
    def __init__(self, csv_file, root_dir):
        self.annotations = pd.read_csv(csv_file)
        self.root_dir = root_dir

    def __len__(self):
        return len(self.annotations)

    def __getitem__(self, index):
        audio_path = os.path.join(self.root_dir, self.annotations.iloc[index, 0])
        target_path = os.path.join(self.root_dir, self.annotations.iloc[index, 1])
        audio, _ = torchaudio.load(audio_path)
        target, _ = torchaudio.load(target_path)
        return audio, target

when both the input and expected output are waveforms? In my case, I created a csv file that contains the filenames of both audios and I just load the path-like object into torchaudio.load(). Also, how do I ensure that all the audio and target in a batch are of the same length when being loaded?

JuanFMontesinos · March 11, 2021, 12:16pm

Well there are two main possibilities.
One is that you use a fix length of audio. You should make chunks of audio given a sample.
Another option is that you use variable length (typically when using transformers)
In that case you should rewrite dataloader’s collate function.

Superklez · March 11, 2021, 12:33pm

What I implemented was

def batch_pad(batch):
    batch = [item.t() for item in batch]
    batch = torch.nn.utils.rnn.pad_sequence(batch, batch_first=True, padding_value=0.)
    return batch.permute(0, 2, 1)

def collate_fn(batch):
    tensors = []
    targets = []
    
    for waveform, target in batch:
        tensors += [waveform]
        targets += [target]

    tensors = batch_pad(tensors)
    targets = batch_pad(targets)

    return tensors, targets

Or is there some way I can use batch_pad on both tensors and targets together?

JuanFMontesinos · March 12, 2021, 3:17pm

Batch pad just pads the tensors with zeros

What do you mean by together?