Ways to online inference streaming ASR

Hi @nateanl,

For online streaming ASR inference, I found two ways,

  1. Here list a tutorial, I wonder whether it is only designed for CPU? (Because I find this page is only available for v0.12.0dev+cpu) And is there any plan to extend it to GPU inference?

  2. v0.11.0/examples/asr/librispeech_emformer_rnnt/pipeline_demo.py provides a demo for online inference, which is a feature of v0.11.0 but is removed from branch main. I wonder whether it is stable and can reproduce the results as shown in the mov of Revise RNN-T pipeline streaming decoding logic by hwangjeff · Pull Request #2192 · pytorch/audio · GitHub?

Further, is there any other ways to implement online ASR with GPU?

Waiting for your reply, sincerely.

Hi @Rachel_Zhang

  1. It can be easily adapted to work on GPU. Moving the decoder to GPU
decoder = bundle.get_decoder().to(device="cuda")

and moving the features and length tensors in the inference loop to GPU

features = features.to(device="cuda")
length = length.to(device="cuda")

should suffice.

  1. That demo script still exists; it has just been moved to audio/pipeline_demo.py at main · pytorch/audio · GitHub.
1 Like

Thank you very much. @hwangjeff