Hello. I use DeepSpeech2 for korean speech recognition.
I used nearly 20,000 audio for learning. It’s almost 51.6 hours.
I extract 1202 character in transcripts, 1202+1(pad token)+1(space). It means FC Layer output size 1204.
But Loss almost not decreasing. I think preprocessing or not suitable audio file amount.
Help me…