Ways to online inference streaming ASR

Rachel_Zhang · April 1, 2022, 3:46am

For online streaming ASR inference, I found two ways,

Here list a tutorial, I wonder whether it is only designed for CPU? (Because I find this page is only available for v0.12.0dev+cpu) And is there any plan to extend it to GPU inference?
v0.11.0/examples/asr/librispeech_emformer_rnnt/pipeline_demo.py provides a demo for online inference, which is a feature of v0.11.0 but is removed from branch main. I wonder whether it is stable and can reproduce the results as shown in the mov of Revise RNN-T pipeline streaming decoding logic by hwangjeff · Pull Request #2192 · pytorch/audio · GitHub?

Further, is there any other ways to implement online ASR with GPU?

Waiting for your reply, sincerely.

hwangjeff · April 1, 2022, 3:56pm

decoder = bundle.get_decoder().to(device="cuda")

and moving the features and length tensors in the inference loop to GPU

features = features.to(device="cuda")
length = length.to(device="cuda")

should suffice.

That demo script still exists; it has just been moved to audio/pipeline_demo.py at main · pytorch/audio · GitHub.

Rachel_Zhang · April 2, 2022, 2:38am

Thank you very much. @hwangjeff