Applying quantization for StreamingASR code in pytorch mobile

BillyGun27 · February 28, 2023, 10:01am

Does anyone ever try to apply quantization for StreamingASR in android demo before?
Could someone provide a guidance on what to do?
I am not sure why this error occured, is the model just not supported yet?

I am trying to apply dynamic quantization for Emformer-RNNT-based Model

github.com

pytorch/android-demo-app/blob/master/StreamingASR/generate_ts.py

from typing import Dict, List, Optional, Tuple
import json
import math

from fairseq.data import Dictionary
import torch
import torchaudio
from torchaudio.pipelines import EMFORMER_RNNT_BASE_LIBRISPEECH
from torchaudio.models import Hypothesis


def get_hypo_tokens(hypo: Hypothesis) -> List[int]:
    return hypo[0]


def get_hypo_score(hypo: Hypothesis) -> float:
    return hypo[3]


def to_string(input: List[int], tgt_dict: List[str], bos_idx: int = 0, eos_idx: int = 2, separator: str = "",) -> str:

This file has been truncated. show original

this is the quantization code that I use

model_pquant = torch.quantization.quantize_dynamic(
    wrapper,  # the original model
    {torch.nn.LSTM, torch.nn.Linear},  # a set of layers to dynamically quantize
    dtype=torch.qint8)  # the target dtype for quantized weights

torch.save(model_pquant,'quant_dynamic_Emformer-RNNT_pure')

during inference, it produce this error

RuntimeError: In ChooseQuantizationParams, min should be less than or equal to max

BillyGun27 · March 2, 2023, 6:17am

after further analyzing, the error is caused because the input tensor is wrong. Since I was trying to replace pyaudio.PyAudio() streaming with audio file.

So my error is because the input.

I unable to replace streaming with audio file