Is there any way of changing the sample rate using torchaudio, either when loading it or afterwards via a transform, similar to how librosa allows librosa.load('soundfile.mp3',sr=16000)
? This is an essential feature to have, as all ML models require a fixed sample rate of audio, but I cannot find it anywhere in the docs.
It’s not possible with the torchaudio library as of now. You could preprocess your files with a shell script that uses sox to do this.
#!/bin/bash
TMPDIR=/tmp/sox
for fn in $(find . -name "*.wav"); do
TMPFILE=$TMPDIR/$(basename $fn)
sox $fn $TMPFILE rate 16000
mv $TMPFILE $fn
done
Thanks David. Do you think there’s any chance that this feature will be added to torchaudio in a later release? It would be very useful to have.
Maybe, I’ve got some time and have been playing around with the pytorch cpp extensions. I’ll look into how difficult it would be to integrate more of the sox functions into the torchaudio library. But no promises.
Hi David,
I just got round to running your script, but I get the error
sox FAIL formats: can't open output file `/tmp/sox/filename.mp3': No such file or directory
mv: cannot stat '/tmp/sox/filename.mp3': No such file or directory
Any ideas why this is happening?
yes, /tmp/sox doesn’t exist. You need to create it first. Or use a different temporary directory.
@Blaze I know there doesn’t exist anything in torchaudio to do that for you, but I needed a back-propagable method for changing the sample_rate. So I’m using this:
import torch.nn as nn
import torch
import torchaudio
class ChangeSampleRate(nn.Module):
def __init__(self, input_rate: int, output_rate: int):
super().__init__()
self.output_rate = output_rate
self.input_rate = input_rate
def forward(self, wav: torch.tensor) -> torch.tensor:
# Only accepts 1-channel waveform input
wav = wav.view(wav.size(0), -1)
new_length = wav.size(-1) * self.output_rate // self.input_rate
indices = (torch.arange(new_length) * (self.input_rate / self.output_rate))
round_down = wav[:, indices.long()]
round_up = wav[:, (indices.long() + 1).clamp(max=wav.size(-1) - 1)]
output = round_down * (1. - indices.fmod(1.)).unsqueeze(0) + round_up * indices.fmod(1.).unsqueeze(0)
return output
if __name__ == '__main__':
wav, sr = torchaudio.load('small_stuff/original.wav')
osr = 22050
batch = wav.unsqueeze(0).repeat(10, 1, 1)
csr = ChangeSampleRate(sr, osr)
out_wavs = csr(wav)
torchaudio.save('down1.wav', out_wavs[0], osr)
I understand this is old, but no one here mentioned resampling,
arr, org_sr = torchaudio.load(x['path'])
arr = torchaudio.functional.resample(arr, orig_freq=org_sr, new_freq=new_sr)