Is there a masking method for CNN input of variable size within a batch?

Hi, if you are not familiar with PyTorch, I suggest you reading this data loading tutorial first.

I am assuming each of your speech data is just a vector, right? But the vector length is not all the same. PyTorch will by default try to pack the samples in a batch to form a tensor. But if your samples in a batch have variable size, the packing of samples will fail.

The Dataloader class has a parameter called collate_fn which controls how samples in a batch should be packed together. For example you can just store the samples in a batch in a list, with each element a speech sample. Store the corresponding label in a Tensor, For more info, refer to this post.