Change sample rate using torchaudio

Blaze · August 15, 2018, 12:38pm

Is there any way of changing the sample rate using torchaudio, either when loading it or afterwards via a transform, similar to how librosa allows librosa.load('soundfile.mp3',sr=16000)? This is an essential feature to have, as all ML models require a fixed sample rate of audio, but I cannot find it anywhere in the docs.

dhpollack · August 15, 2018, 1:18pm

It’s not possible with the torchaudio library as of now. You could preprocess your files with a shell script that uses sox to do this.

#!/bin/bash

TMPDIR=/tmp/sox

for fn in $(find . -name "*.wav"); do
  TMPFILE=$TMPDIR/$(basename $fn)
  sox $fn $TMPFILE rate 16000
  mv $TMPFILE $fn
done

Blaze · August 15, 2018, 1:24pm

Thanks David. Do you think there’s any chance that this feature will be added to torchaudio in a later release? It would be very useful to have.

dhpollack · August 15, 2018, 1:26pm

Maybe, I’ve got some time and have been playing around with the pytorch cpp extensions. I’ll look into how difficult it would be to integrate more of the sox functions into the torchaudio library. But no promises.

Blaze · August 15, 2018, 2:30pm

Hi David,
I just got round to running your script, but I get the error

sox FAIL formats: can't open output file `/tmp/sox/filename.mp3': No such file or directory
mv: cannot stat '/tmp/sox/filename.mp3': No such file or directory

Any ideas why this is happening?

dhpollack · August 15, 2018, 3:00pm

yes, /tmp/sox doesn’t exist. You need to create it first. Or use a different temporary directory.

Amin_Jun · October 16, 2020, 6:18pm

@Blaze I know there doesn’t exist anything in torchaudio to do that for you, but I needed a back-propagable method for changing the sample_rate. So I’m using this:

import torch.nn as nn
import torch
import torchaudio


class ChangeSampleRate(nn.Module):
    def __init__(self, input_rate: int, output_rate: int):
        super().__init__()
        self.output_rate = output_rate
        self.input_rate = input_rate

    def forward(self, wav: torch.tensor) -> torch.tensor:
        # Only accepts 1-channel waveform input
        wav = wav.view(wav.size(0), -1)
        new_length = wav.size(-1) * self.output_rate // self.input_rate
        indices = (torch.arange(new_length) * (self.input_rate / self.output_rate))
        round_down = wav[:, indices.long()]
        round_up = wav[:, (indices.long() + 1).clamp(max=wav.size(-1) - 1)]
        output = round_down * (1. - indices.fmod(1.)).unsqueeze(0) + round_up * indices.fmod(1.).unsqueeze(0)
        return output


if __name__ == '__main__':
    wav, sr = torchaudio.load('small_stuff/original.wav')
    osr = 22050
    batch = wav.unsqueeze(0).repeat(10, 1, 1)
    csr = ChangeSampleRate(sr, osr)
    out_wavs = csr(wav)
    torchaudio.save('down1.wav', out_wavs[0], osr)

Kenan · August 23, 2022, 5:41pm

I understand this is old, but no one here mentioned resampling,

arr, org_sr = torchaudio.load(x['path'])
arr = torchaudio.functional.resample(arr, orig_freq=org_sr, new_freq=new_sr)