How to solve the graph break happen in torch.compile

Ghogha_Atif · February 19, 2025, 4:31am

I am solving the problem of my fine tuning a model am i am using a qlora to fine tune my model and using a torch.compile but i am getting a graph break in my line.how can i solve this what is the possible way anyone guide me for this i have learned the torch.compile is not support the 4 bit quantization how to disable or remove it from the compilation in mlp or in attention layer.

torch_compile_options = {
“epilogue_fusion” : True,
“max_autotune” : True,
“shape_padding” : True,
“trace.enabled” : True,
“triton.cudagraphs” : False,
}

@torch.compile(fullgraph=False,dynamic=True,options=torch_compile_options)

def compiled_llama_mlp(self,x):

down_proj=self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))

return down_proj

transformers.models.llama.modeling_llama.LlamaMLP.forward = compiled_llama_mlp

and from training i am using the hugging face tranier not creating a traning script from scratch

RAMESH_BABU · February 23, 2025, 10:13am

I’m currently encountering the same issue. Torch Dynamo doesn’t support several operations used in the BitsAndBytes (BnB) quantization process. One approach is to modify BnB so that almost every operation becomes traceable by Dynamo; alternatively, we could update Dynamo to support the operations BnB relies on.

For instance, some of the unsupported operations include:

In the Linear4bit module, the Params4bit object contains a transpose operation on a tensor within a user-defined object, which isn’t supported.
Calling A.data_ptr() on a tensor isn’t supported.
Using ctypes.c_void_p variables is also not supported by Dynamo.

While a simple workaround would be to disable Dynamo for these operations and revert to eager mode, that defeats our goal of eliminating graph breaks. Since our aim is to stitch everything into a single graph, the conclusion I’ve reached is that we need to modify some of the parts in BnB and torch._dynamo to support these quantization operations at all graph breaks.

Fun fact: The native model doesn’t trigger any graph breaks, LoRA causes just a single, easily fixable graph break, whereas the QLoRA model results in 39–40 graph breaks.

If anyone has a simpler solution, please share—I’ve already spent over two days fixing these graph break issues and still have 3–4 more to address (around 6–7 unique graph break reasons, which cumulatively result in 39–40 breaks).

bdhirsh · February 23, 2025, 9:09pm

If the BnB code you’re calling is effectively a bunch of custom kernels (which sounds likely given the data pointer access), then the right approach here is generally to refactor bits-and-bytes’ kernels in to custom PyTorch operators, so they can be properly plumbed through the compiler. There’s a nice guide for it here: PyTorch Custom Operators — PyTorch Tutorials 2.6.0+cu124 documentation

niconunezz · March 14, 2025, 10:15pm

how did you fix the first one, i have a similar problem but nothing seems to work?

RAMESH_BABU · March 15, 2025, 5:22am

The issue affects every subclass of Tensor. Fortunately, the torch community has already resolved it in the latest nightly release. You can either switch to this nightly version for an immediate fix or examine its changes and patch them to your current stable version.

niconunezz · March 16, 2025, 9:50pm

I’ve been trying to find the specific commit, but it appears nowhere. Do you have any clue?

byi8220 · March 17, 2025, 6:37pm

I think we ran into the same issue

Digging around in Dynamo Deep-Dive — PyTorch 2.6 documentation I noticed this,

In particular, sometimes we find that an object is tracked as a UserDefinedObjectVariable (this is Dynamo’s catch-all class), when it should have been tracked as something more specific. In these cases, the SourceBuilder.__call__ logic is often to blame.

Would you agree with the idea that this is a torch compiler bug, where torch fails to properly wrap user defined subclasses of Tensor as TensorVariable (a bug which was fixed in nightly?)

Also,

The issue affects every subclass of Tensor.

Every subclass, or just user defined subclasses? Is this actually breaking on torch’s own tensor subclasses, e.g. torch.nn.Parameter?

RAMESH_BABU · March 17, 2025, 10:03pm

I believe the issue originates from the _wrap method in the VariableBuilder class. This bug does not affect PyTorch’s built-in tensor subclasses but specifically impacts user-defined classes that inherit from Tensor.

byi8220 · March 17, 2025, 10:40pm

I believe the issue originates from the _wrap method in the VariableBuilder class.

Hm, where do you think the issue may be and do you know if/how the nightly fixed it? I’m hesitant to just bump up to the latest nightly, since this may cause a ton of version breakage.

IIUC, we can explicitly mark a tensor subclass as traceable via _dynamo.config.traceable_tensor_subclasses, with heavy restrictions.

I suspect maybe one could register Params4bit here, and mess around with __torch_forward__ and __torch_dispatch__ for it to make something work. But I haven’t had any success there.

RAMESH_BABU · March 18, 2025, 3:10am

I was working on this issue for a small assignment without major dependencies. Since I was also busy with my semester exams at the time, I opted for a quicker fix by using the nightly version. I didn’t really look into how they actually solved it.

byi8220 · March 18, 2025, 1:49pm

I was working on this issue for a small assignment without major dependencies.

I think we’re working on the same problem, and ran into the exact same issue. Upgrading to nightly caused my entire env to break, and I would have preferred a cleaner solution over hacking together a bunch of conflicting package versions. But for the sake of time, if it works it works.

RAMESH_BABU · March 18, 2025, 3:16pm

If you’re working on Unsloth puzzles, you might find this repository helpful: