Channels in MFCC, and how to use them

Rohan_Kumar · October 29, 2019, 6:33pm

what are channels in mfcc, because when i use transform.mfcc i get the output as [2, n_mfcc, time], my question is what are the channels, and are the number of channels consistent in similar type of audio (basically in a dataset), how do use channel in models concatenate mfcc end-to-end?

vincentqb · October 29, 2019, 7:03pm

The conventions we use for dimensions are given in the README. In particular, a waveform is (channel, time), and MFCC : (channel, time) -> (channel, mfcc, time), and so MFCC is applied per channel. Your original waveform must therefore have had 2 channels.

In the datasets we provide, the number of channels are the same across all waveforms.

Since the output of MFCC is just a tensor, you can use torch.cat to concatenate two MFCCs along a given axis.

Is that what you were asking?

Rohan_Kumar · October 29, 2019, 7:52pm

It’s pretty much what I asked, thank you. I still want to know what channels are actually. And if you use log2 with mfcc, which I think is used, how do you handle the ban values, currently I have replaced them with 0s.