Speech to text in pytorch

I have to create a reusable library that can convert a paragraph of spoken english to written english. For example, “two dollars” should be converted to $2. Abbreviations spoken as “C M” or “Triple A” should be written as “CM” and “AAA” respectively. How to proceed with this problem in pytorch.

Not really a question that calls for neural nets, as verbalization is a very rule-based problem. I suggest you take a look at https://github.com/google/sparrowhawk, which tackles the inverse problem ($2 --> two dollars). Not too hard to flip stuff around after spending time learning about FSTs. There is some work on doing this with neural nets but they tend to be trained on corpuses generated by rule-based grammar methods.

1 Like