Currently, I am using PyTorch version 2.6.0, but I am encountering an error where—even when applying dropout or normalization—I have to clone tensors. Even when I use
torch.compiler.cudagraph_mark_step_begin()
I still need to add the cloning. This is my compilation method:
torch.compile(self.model, backend="inductor", mode="max-autotune", fullgraph=False)
I do not know whether this is a bug or if it is intentional in this version. I am aware that the new version added more memory validations, but now even adding dropout or using RMS forces me to clone tensors.