Shapes of FakeTensors after graph transformations on ATen IR

Hi, I am currently adding some custom ops to the forward graph at the ATen IR level (i.e modifying the fwd graph output from AOTAutograd).

Taking an example, assume I have a tensor O of shape (b, s, n, d) in the unmodified graph. Now I add my custom op after O which changes its shape to (b, s / 2, n / 2, 4 * d).

The tensor O is also used in the backward graph. My understanding is that the backward graph is independent from the forward graph at this point - making changes to the fwd graph wont affect the bwd graph. Therefore, before using tensor O in the backward graph, I add another custom operation to reshape it to (b, s / 2, n / 2, 4 * d).

However, this does not seem to work. I get an assertion erro arising from assert_size_stride(getitem_1, (1, 32, 1024, 128), ). On inspecting the generated code, I see that the backward graph has certain assertions on the shape of inputs at the beginning.

I tried updating the node.meta[‘val’], which stores the FakeTensors input to the node but I get the same error. I am wondering what is the best way to update the shapes of the FakeTensors associated with the nodes in the modified forward and backward graphs?

What’s the use case? (why do you want to insert a custom op into the forward pass)?

I have defined a custom op because it is not traceable by Dynamo so I first get the forward graph and then manually insert my custom op to avoid graph breaks.

Our recommended workflow is to create a custom op, call it from your forward pass, and then let Dynamo trace it. What goes wrong there for you?

Our use case is slightly different. We could receive any model as an input from the user and we want manually add our custom op to the model as an optimization. We found that a good way to achieve this is to get the ATen IR and look for patterns to insert our custom ops.

The workflow would look like this: users provide the models and define what optimizations they want, we compile the model, get the ATen IR, insert the custom ops in the forward graph at the ATen IR based on the chosen optimizations and then run it.

In your recommended workflow, the user will have to make changes to their code.

I have the following questions:

  1. Is the current error that I’m seeing because of Guards i.e are the assertions generated from dynamo?
  2. Is there a way to change the shape and stride assertions at the ATen IR level? Possibly via overriding or re-evaluating the shapes based on the modified graph?

What does your custom operator do? It seems weird that the addition of a custom op (that returns a Tensor of a different shape) would optimize the model.

It’s a bit difficult for me to answer the questions without a repro. Do you have a script that reproduces the issue that we could take a look at?

Apologies for the late response, I was able to find a workaround to this.

I do have some other questions but I can create a new post for them