Training pytorch implementation of 'Tacotron 2' with custom data

Can we train the pytorch version of Tacotron 2 with our own Data?

Yes, you should be able to swap the data from your current script to your custom data.
Do you see any issues with this approach?

Thanks a lot for responding!

tacotron2 = torch.hub.load('nvidia/DeepLearningExamples:torchhub', 'nvidia_tacotron2')

This line loads the pre-trained tacotron2 over LJ Speech dataset. How do I load raw untrained model in order to train with my own data? And what is the line for that? Can you help?!

I guess torch.hub contains all pretrained models, right? If so, torch.load(PATH) should load the desired raw model. If so, what is the path I have to give for loadin tacotron2?

If you want to use the raw model and train it, I would recommend to check out this repository, which provides the model definition as well as training code.

Can we train this model with JSUT dataset?

Hi, I am a bit curious for the default data LJSpeech we need the step bash scripts/ In my opinion we would need the mels as well with a different dataset, but in the documentation under the point Multi-dataset it does not implicitly name this step. Also, I was able to start training Tacotron as well as WaveGlow with my own data. So to phrase a question do we need to run on our own data to run both models correctly?
Thanks for your time.

I would assume you need to recreate the mel spectra, but feel free to create an issue with this question in the repository.

I just talked to Grzegorz (author of the repo), who explained, that allows you to load mels directly from you dist instead of processing wav files on the fly and is therefore recommended.

With mels on disk use --load-mel-from-disk --training-files=filelists/ljs_mel_text_train_filelist.txt --validation-files=filelists/ljs_mel_text_val_filelist.txt
The filelists specify the paths inside the dataset, just check filelists/ljs_mel_text_val_filelist.txt as an example

1 Like