Training pytorch implementation of 'Tacotron 2' with custom data

Can we train the pytorch version of Tacotron 2 with our own Data?

Yes, you should be able to swap the data from your current script to your custom data.
Do you see any issues with this approach?

Thanks a lot for responding!

tacotron2 = torch.hub.load('nvidia/DeepLearningExamples:torchhub', 'nvidia_tacotron2')

This line loads the pre-trained tacotron2 over LJ Speech dataset. How do I load raw untrained model in order to train with my own data? And what is the line for that? Can you help?!

I guess torch.hub contains all pretrained models, right? If so, torch.load(PATH) should load the desired raw model. If so, what is the path I have to give for loadin tacotron2?

If you want to use the raw model and train it, I would recommend to check out this repository, which provides the model definition as well as training code.

Can we train this model with JSUT dataset?

Hi, I am a bit curious for the default data LJSpeech we need the step bash scripts/prepare_mels.sh. In my opinion we would need the mels as well with a different dataset, but in the documentation under the point Multi-dataset it does not implicitly name this step. Also, I was able to start training Tacotron as well as WaveGlow with my own data. So to phrase a question do we need to run prepare_mels.sh on our own data to run both models correctly?
Thanks for your time.

I would assume you need to recreate the mel spectra, but feel free to create an issue with this question in the repository.

EDIT:
I just talked to Grzegorz (author of the repo), who explained, that prepare_mels.sh allows you to load mels directly from you dist instead of processing wav files on the fly and is therefore recommended.

With mels on disk use --load-mel-from-disk --training-files=filelists/ljs_mel_text_train_filelist.txt --validation-files=filelists/ljs_mel_text_val_filelist.txt
The filelists specify the paths inside the dataset, just check filelists/ljs_mel_text_val_filelist.txt as an example

1 Like