Optimizing a TTS Model

Hello guys, I am here looking for help with optimization of the CSM-1B model fine-tuned using Unsloth.

After successfully fine-tuning this TTS model, I was able to reference it and generate an audio. The problem is that I have been unable to bring Real-time Factor (RTF) to <1.0

Even after loading the model with the following compilation instructions

        try:
            model.depth_decoder = torch.compile(
                model.depth_decoder, 
                mode='reduce-overhead', 
                fullgraph=True, 
                backend='inductor'
            )

            model = torch.compile(model, 
                mode='reduce-overhead',
                fullgraph=True,
                backend='inductor'
            )

        except Exception as e:
            print(f"Warning: Torch compilation failed: {e}. Proceeding without compilation.")

When I use the original CSM-1B Model that has not been fine-tuned yet with the same compilation instructions, I am getting an RTF of 0.6x

try:
    print(f"Is encoder-decoder: {model.config.is_encoder_decoder}")
    model.decoder = torch.compile(model.decoder, mode='reduce-overhead', fullgraph=True, backend='inductor')
except Exception as e:
    print(f"Warning: Torch compilation failed: {e}. Proceeding without compilation.")