Audio pronounciation analyzer

ammar_siddiqui · January 18, 2023, 11:36am

I have audio files as a dataset and i want to create a model which will rate the user provided audio as good, bad or moderate with respect to the pronounciation. Which approach can I use to make a model of this kind?

tkuser · January 18, 2023, 12:06pm

If you have sequential data, one of the classical approaches would be to use recurrent neural network or LSTM (however I have used 1D CNNs at it also worked out perfectly). Another newer approach would be the use of transformer architecture for this kind of task which in my case has shown to get good performance and was associated with working better to out of distribution samples (see for instance [1].

[1] Exploring the Limits of Out-of-Distribution Detection

ammar_siddiqui · January 19, 2023, 6:43am

Will look into it. Thanks