Hello guys, I am here looking for help with optimization of the CSM-1B model fine-tuned using Unsloth.
After successfully fine-tuning this TTS model, I was able to reference it and generate an audio. The problem is that I have been unable to bring Real-time Factor (RTF) to <1.0
Even after loading the model with the following compilation instructions
try:
model.depth_decoder = torch.compile(
model.depth_decoder,
mode='reduce-overhead',
fullgraph=True,
backend='inductor'
)
model = torch.compile(model,
mode='reduce-overhead',
fullgraph=True,
backend='inductor'
)
except Exception as e:
print(f"Warning: Torch compilation failed: {e}. Proceeding without compilation.")
When I use the original CSM-1B Model that has not been fine-tuned yet with the same compilation instructions, I am getting an RTF of 0.6x
try:
print(f"Is encoder-decoder: {model.config.is_encoder_decoder}")
model.decoder = torch.compile(model.decoder, mode='reduce-overhead', fullgraph=True, backend='inductor')
except Exception as e:
print(f"Warning: Torch compilation failed: {e}. Proceeding without compilation.")