Removing clicks in an audio file with torchaudio

rodiram · March 15, 2025, 5:24pm

Hello all, I have recorded an audio file for a podcast but for some reason there are lot’s of short clicks sounds probably originating from some electronic microphone noise. Attach is an example shown as a spectrogram. I would like to cut this noise and reconstruct the voice signal using torchaudio. Any hint about the method I should use or some python example ? Or some keyword I could use in Google ?
Thanks !

KFrank · March 16, 2025, 3:00pm

Hi Rodiram!

I don’t know off-hand of any pre-packaged tool that would do this for you. (But there
might be one.)

Try searching for “audio denoising” or “audio restoration” together with things like
pytorch and / or cnn.

I’ve never used torchaudio nor done any machine-learning audio processing. It is,
however, reasonable to imagine training a model to do this for your.

A similar use case exists for image processing where you want to denoise / restore
images. A sensible approach (there are others) is to start with some “clean” images
to which you add noise. The clean images become your ground-truth targets and
the noisy images become the input to your model. Train the model to recover your
clean ground-truth images from your input images to which you’ve added synthetic
noise.

You can now use that trained model to denoise images that have real noise in them.

If you have some clean audio – maybe some podcasts that don’t have the microphone
noise – and some way to add semi-realistic “clicks” to it, you ought to be able to train
an audio “de-clicker” using the same scheme.

Best.

K. Frank