Hi
I have registered custom backend in pytorch 2.0 and running GPT-J model for both inductor and custom backed named ABCD.
tokenizer = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-gptj")
model = GPTJForCausalLM.from_pretrained("hf-internal-testing/tiny-random-gptj")
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs, labels=inputs["input_ids"])
input_ids = inputs["input_ids"]
start_time = time.time()
print("cpu.................")
fn_cpu = torch.compile(model.generate,backend="inductor")
with torch.no_grad():
output_cpu = fn_cpu(input_ids, do_sample=True, temperature=0.9, max_length=200)
print(output_cpu)
print("--- %s pytorch with compile CPU seconds ---" % (time.time() - start_time))
torch._dynamo.reset()
Above same code i ran it for backend ABC
now I am capturing fx grahp for both the backends
For Inductor you can see below code
def __call__(self, model_, inputs_):
from torch._inductor.compile_fx import compile_fx
print("CPU Graphs")
model_ir = model_.print_readable(print_output=False)
with open(f"pt_graph_fwd_cpu.ir", 'w') as file:
file.write(model_ir)
from torch.fx.passes import graph_drawer
gd = graph_drawer.FxGraphDrawer(model_, 'f')
pydot_graph = gd.get_dot_graph()
pydot_graph.write_png(f"pt_graph_fwd_cpu.png")
return compile_fx(model_, inputs_, config_patches=self.config) ```
for custom backed end
```@register_backend
def ABC_backend(model:GraphModule, inputs:List[FakeTensor]):
compiled_graph = None
def fwd(*args):
nonlocal model
nonlocal compiled_graph
if compiled_graph is None:
model_ir = model_.print_readable(print_output=False)
with open(f"pt_graph_fwd_cpu.ir", 'w') as file:
file.write(model_ir)
from torch.fx.passes import graph_drawer
gd = graph_drawer.FxGraphDrawer(model_, 'f')
pydot_graph = gd.get_dot_graph()
pydot_graph.write_png(f"pt_graph_fwd_cpu.png")
compiled_graph = ABCDBACKENDCLASS(model,inputs,args)
del model
return compiled_graph(*args)
return fwd```
Now this issues is , I am seeing different fx graphs for both flows, and what i have observed is in Inductor flow compile_fx is being called multiple times so all the sub grahps are are merged where as for ABC backend i can see only once its being called.