Spectrogram data load

1c9d70faac66efabd051 · January 7, 2019, 12:39pm

I worked on speech enhancement with VCTK database.

I want to load data and apply the pre-processing method simultaneously and efficiently. pre-processing is performed in just one python function.

My problem is, when I use the dataloader, it load just one wave file per loading. And it return the different amount of training data after pre-processing because the wave files which have different time length are chopped into input size in pre-processing. It means for every iteration, network will be trained with small and different batch size, and it is time consuming.

So I want to loading and pre-processing simultaneously for training with same batch size ( It should be stacked for several wav file). Now, I saved all preprocessed data as npy file. Is there more efficient way for loading data.

dhpollack · January 7, 2019, 2:25pm

You can create a custom collate_fn in your dataloader that pads or trims the output from the Dataset properly and then stacks the padded/trimmed tensors into a batch. Or if you are using an rnn then you can put it into a PackedSequence.

1c9d70faac66efabd051 · January 8, 2019, 6:06am

thank you for reply.

could you give me some simple example code or link for stacking tensor into a batch…

dhpollack · January 8, 2019, 8:44am

You can search the forum for padding packed sequences. But the gist of it is that the collate function takes a list of outputs from the Dataset, then you unpack that, get the lengths, then pad based on the maximum length in the batch.

def collate_spectrograms_fn(batch):
  # assuming sig has size (c, l, n_ftt)
  sigs, targets = zip(*batch)
  lengths = torch.tensor([sig.size(1) for sig in sigs], dtype=torch.long)
  max_len = lengths.max()
  sigs = [pad(sig, (0, 0, 0, ma_len - sig.size(1)) for sig in sigs]
  return torch.stack(sigs), torch.cat(targets), lengths

I haven’t tested that but that’s the general idea.