Change sample rate using torchaudio

Is there any way of changing the sample rate using torchaudio, either when loading it or afterwards via a transform, similar to how librosa allows librosa.load('soundfile.mp3',sr=16000)? This is an essential feature to have, as all ML models require a fixed sample rate of audio, but I cannot find it anywhere in the docs.

1 Like

It’s not possible with the torchaudio library as of now. You could preprocess your files with a shell script that uses sox to do this.

#!/bin/bash

TMPDIR=/tmp/sox

for fn in $(find . -name "*.wav"); do
  TMPFILE=$TMPDIR/$(basename $fn)
  sox $fn $TMPFILE rate 16000
  mv $TMPFILE $fn
done

1 Like

Thanks David. Do you think there’s any chance that this feature will be added to torchaudio in a later release? It would be very useful to have.

Maybe, I’ve got some time and have been playing around with the pytorch cpp extensions. I’ll look into how difficult it would be to integrate more of the sox functions into the torchaudio library. But no promises.

Hi David,
I just got round to running your script, but I get the error

sox FAIL formats: can't open output file `/tmp/sox/filename.mp3': No such file or directory
mv: cannot stat '/tmp/sox/filename.mp3': No such file or directory

Any ideas why this is happening?

yes, /tmp/sox doesn’t exist. You need to create it first. Or use a different temporary directory.

@Blaze I know there doesn’t exist anything in torchaudio to do that for you, but I needed a back-propagable method for changing the sample_rate. So I’m using this:

import torch.nn as nn
import torch
import torchaudio


class ChangeSampleRate(nn.Module):
    def __init__(self, input_rate: int, output_rate: int):
        super().__init__()
        self.output_rate = output_rate
        self.input_rate = input_rate

    def forward(self, wav: torch.tensor) -> torch.tensor:
        # Only accepts 1-channel waveform input
        wav = wav.view(wav.size(0), -1)
        new_length = wav.size(-1) * self.output_rate // self.input_rate
        indices = (torch.arange(new_length) * (self.input_rate / self.output_rate))
        round_down = wav[:, indices.long()]
        round_up = wav[:, (indices.long() + 1).clamp(max=wav.size(-1) - 1)]
        output = round_down * (1. - indices.fmod(1.)).unsqueeze(0) + round_up * indices.fmod(1.).unsqueeze(0)
        return output


if __name__ == '__main__':
    wav, sr = torchaudio.load('small_stuff/original.wav')
    osr = 22050
    batch = wav.unsqueeze(0).repeat(10, 1, 1)
    csr = ChangeSampleRate(sr, osr)
    out_wavs = csr(wav)
    torchaudio.save('down1.wav', out_wavs[0], osr)
1 Like