Difference between kaldi mfcc and torchaudio transform mfcc

binhtran · December 7, 2023, 6:10am

I see that torchaudio.transforms.MFCC and torchaudio.compliance.kaldi.mfcc results are different:

import torchaudio
import torch

torch.manual_seed(0)
torch.set_printoptions(precision=3, sci_mode=False)

wave = torch.rand(1, 400)

# torchaudio mfcc
transform = torchaudio.transforms.MFCC(
    sample_rate=16000,
    n_mfcc=13,
    melkwargs={"n_fft": 400, "hop_length": 160, "n_mels": 23, "center": False},
)
ta_mfcc = transform(wave)[0].transpose(0, 1)

# kaldi compliance mfcc
kaldi_mfcc = torchaudio.compliance.kaldi.mfcc(wave * 2**15, window_type="hanning")

ft = torch.cat([ta_mfcc, kaldi_mfcc]).transpose(0, 1)

print(ft)

The result is:

tensor([[ 92.246, 115.379],
        [-10.815, -34.377],
        [  2.703, -11.685],
        [  0.333, -15.649],
        [  4.773,  -7.279],
        [  1.226, -13.743],
        [  2.976, -10.609],
        [  6.198,  -2.479],
        [  4.769,  -4.193],
        [  5.665,  -0.910],
        [  5.217,  -0.147],
        [  4.096,  -2.355],
        [  5.315,   1.021]])

Is there a way to configure torchaudio.compliance.kaldi.mfcc so that the result is the same as that of torchaudio.transforms.MFCC.