Voice Conversion

Hello,

I have different audio files in .wav format. In each audio file there is a female/male voice saying one word in Kazakh language. I would like to increase the samples, so that from 1 audio file I can generate 3-4 samples with clear pronunciation of the word but with different voice. I have tried different augmentations, but it didn’t help that much.

Is there any other options?

Hi there,
I was just trying to do the same for my own language.
It is still under construction but check it out here: https://github.com/mube1/voice_clone/blob/main/data.py
Also this works : Audio Data Augmentation — Torchaudio 2.0.1 documentation

Just wondering, what model are you considering?