Hi, if you are not familiar with PyTorch, I suggest you reading this data loading tutorial first.
I am assuming each of your speech data is just a vector, right? But the vector length is not all the same. PyTorch will by default try to pack the samples in a batch to form a tensor. But if your samples in a batch have variable size, the packing of samples will fail.
The Dataloader
class has a parameter called collate_fn
which controls how samples in a batch should be packed together. For example you can just store the samples in a batch in a list, with each element a speech sample. Store the corresponding label in a Tensor, For more info, refer to this post.