Compilation error due to use of dropout

Currently, I am using PyTorch version 2.6.0, but I am encountering an error where—even when applying dropout or normalization—I have to clone tensors. Even when I use

torch.compiler.cudagraph_mark_step_begin()

I still need to add the cloning. This is my compilation method:

torch.compile(self.model, backend="inductor", mode="max-autotune", fullgraph=False)

I do not know whether this is a bug or if it is intentional in this version. I am aware that the new version added more memory validations, but now even adding dropout or using RMS forces me to clone tensors.

Could you post a minimal code snippet to reproduce the issue, please?