what are channels in mfcc, because when i use transform.mfcc i get the output as [2, n_mfcc, time], my question is what are the channels, and are the number of channels consistent in similar type of audio (basically in a dataset), how do use channel in models concatenate mfcc end-to-end?
The conventions we use for dimensions are given in the README. In particular, a waveform is (channel, time), and
MFCC : (channel, time) -> (channel, mfcc, time), and so MFCC is applied per channel. Your original waveform must therefore have had 2 channels.
In the datasets we provide, the number of channels are the same across all waveforms.
Since the output of MFCC is just a tensor, you can use torch.cat to concatenate two MFCCs along a given axis.
Is that what you were asking?
It’s pretty much what I asked, thank you. I still want to know what channels are actually. And if you use log2 with mfcc, which I think is used, how do you handle the ban values, currently I have replaced them with 0s.