My task is to take an episode of a TV show and its subtitles. Then make the subtitle timings more accurate (from 200ms to 20ms). So I want to learn what is speech and what is not.
I’ve now taken the audio, converted it into a spectrogram and separated each column of the spectogram to be a single data item. So now I have two arrays:
print(train_speech.size()) # torch.Size([93482, 201])
print(train_silence.size()) # torch.Size([35038, 201])
All I want to do is a simple multi-linear NN to make a difference.
train_speech is FFT’s of people talking and
train_silence is no talking (used subtitles for the distinction).
My question is what DataLoader can I use to take these into torch?
There is one
DataLoader, which accepts a
Dataset and provides different functionalities such as shuffling, creating batches using multiple workers, etc.
To create a custom
Dataset you could have a look at this tutorial.
What I don’t get is that my data is already a simple tensor… It doesn’t make sense to me that I need to create a separate abstraction just to fetch numbers from a few arrays…
If your data is already stored as tensors, you can just use
TensorDataset or completely skip the abstraction and just feed to data to your model.
Since I had two classes in separate variables I ended up making the custom class.
def __init__(self, speech, silence):
self.data = list(map(lambda x: (x, 1), speech)) + list(map(lambda x: (x, 0), silence))
def __getitem__(self, index):
train_ds = MyDataset(train_speech, train_silence)
train_dl = DataLoader(train_ds, shuffle=True, batch_size=1024)
Thanks for helping me get through this.