Use SeamlessM4Tv2Model, I want to slow down the rate of speech of audio output

lam_vu_Nguyen · March 25, 2024, 2:11am

text_inputs = processor(text="I have a daughter 2 years old, I wanted her name to be Hương Ly", src_lang="eng", return_tensors="pt").to(device)
audio_array = model.generate(**text_inputs, tgt_lang=language)[0].cpu().numpy().squeeze()
file_path = 'audio_from_text.wav'
sf.write(file_path, audio_array, 16000)

doc
[ex]

it has returned a 3 seconds audio

I try adding speech_temperature=0.2 or speech_do_sample=True to generate() but there is no change, it still has returned a 3 seconds audio, for example, I want to change the rate of speech so it will be 5 seconds audio
any ideal ?