I see that torchaudio.transforms.MFCC
and torchaudio.compliance.kaldi.mfcc
results are different:
import torchaudio
import torch
torch.manual_seed(0)
torch.set_printoptions(precision=3, sci_mode=False)
wave = torch.rand(1, 400)
# torchaudio mfcc
transform = torchaudio.transforms.MFCC(
sample_rate=16000,
n_mfcc=13,
melkwargs={"n_fft": 400, "hop_length": 160, "n_mels": 23, "center": False},
)
ta_mfcc = transform(wave)[0].transpose(0, 1)
# kaldi compliance mfcc
kaldi_mfcc = torchaudio.compliance.kaldi.mfcc(wave * 2**15, window_type="hanning")
ft = torch.cat([ta_mfcc, kaldi_mfcc]).transpose(0, 1)
print(ft)
The result is:
tensor([[ 92.246, 115.379],
[-10.815, -34.377],
[ 2.703, -11.685],
[ 0.333, -15.649],
[ 4.773, -7.279],
[ 1.226, -13.743],
[ 2.976, -10.609],
[ 6.198, -2.479],
[ 4.769, -4.193],
[ 5.665, -0.910],
[ 5.217, -0.147],
[ 4.096, -2.355],
[ 5.315, 1.021]])
Is there a way to configure torchaudio.compliance.kaldi.mfcc
so that the result is the same as that of torchaudio.transforms.MFCC
.