PyTorch Model on Mobile

I retrained a model based on deespeech2 to transcribe audio to test by following this blog post. Then I followed the PyTorch mobile documentation because I would like to use the model on a mobile android/ios device.
To be able to run the model I need to load and transform the wav audio 16000 Hz into a Mel spectrogram. I am doing this by using the transforms.MelSpectrogram function. However right now I have this function inside the Dataset module.
Is it correct to add this transformation inside the Model module before forward?
By adding this transformation inside the model will create problems when I call torch.jit.script and functions?

I think you can add the transformation inside the model as long as scripting the model doesn’t complain about unsupported methods. A quick check of transforms.MelSpectrogram shows that PyTorch methods seem to be used, so I guess it would work.
If I’m not mistaken, @tom also used a similar workflow by adding transformations into the model to export it to his mobile in a blog post.

One thing that is a limitation here is the coverage of stft/fft in the backend.
stft maps to the internal _fft_r2c and that in turn is implemented for only through MKL on the CPU.

I have an upcoming tutorial (in a week or two) for keyword recognition on ARM (but Raspberry rather than mobile) where I use the M5 model from the tutorial to avoid the need for spectrogram. It also has a few wrinkles.
For Linux on ARM you can always get numpy’s fft (like librosa does) but that might not be as straightforward on mobile.

Best regards


1 Like