I have trouble with how to feed mfcc features into conv2d layer(N,C,H, W). However, the shape mfcc feature is audio_feature (N, time frames, Mel). I don’t know how to do it? or just unsqueeze(1) like audio_feature.unsqueeze(1)
(N, 1, time frames, Mel). Is it true?
How to do it? Thanks
best wishes
Thank you for reply. Do you know how to feed mfcc features directly into Resent? I mean if the shape of mfcc feature is [batch, time_frame, mel], how to do it?