What is ComputedBuffer?

hohohihi · May 24, 2024, 10:36am

Hello, I am new to inductor.

I am wondering around the codes of graph.py and scheduler.py.
And i make a simple network which consists of gelu function as a simple example.

class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(32,64)
        self.fc2 = nn.Linear(64,64)
        self.fc3 = nn.Linear(64,64)
        self.fc4 = nn.Linear(64,64)
        self.fc5 = nn.Linear(64,64)

    def forward(self,x):
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        x = self.fc4(x)
        x = self.fc5(x)
        x = torch.nn.functional.gelu(x)
        return x

I am currently looking into torch/_inductor/scheduler.py. More specifically, def init of class Scheduler.

I printed out some of the nodes like below.

class Scheduler:
    @dynamo_timed
    def __init__(self, nodes: List[ir.Buffer]) -> None:
        super().__init__()
        V.graph.scheduler = self
        self.backends: Dict[torch.device, BaseScheduling] = {}
        self.post_grad_graph_id = next(_post_grad_graph_counter)

        self.available_buffer_names = {
            *V.graph.graph_inputs.keys(),
            *V.graph.constants.keys(),
            *V.graph.torchbind_constants.keys(),
        }

        self.nodes = [self.create_scheduler_node(n) for n in nodes]

        for n in range(len(self.nodes)):
          print(n)

and it says when n hits 5th round.

ComputedBuffer(name='buf5', layout=FixedLayout('cuda', torch.float32, size=[8, 64], stride=[64, 1]), data=Pointwise(
  'cuda',
  torch.float32,
  def inner_fn(index):
      i0, i1 = index
      tmp0 = ops.load(buf4, i1 + 64 * i0)
      tmp1 = ops.constant(0.5, torch.float32)
      tmp2 = tmp0 * tmp1
      tmp3 = ops.load(buf4, i1 + 64 * i0)
      tmp4 = ops.constant(0.7071067811865476, torch.float32)
      tmp5 = tmp3 * tmp4
      tmp6 = ops.erf(tmp5)
      tmp7 = ops.constant(1, torch.float32)
      tmp8 = tmp6 + tmp7
      tmp9 = tmp2 * tmp8
      return tmp9
  ,
  ranges=[8, 64],
  origin_node=mul_2,
  origins={mul_1, erf, mul, add, mul_2}
))

and this node is represented in fused_gelu when i save the graph as svg file.

what is this ComputedBuffer? is it a node of compiler graph?
Also, what is TensorBox and StorageBox?

vivekvpandya · April 24, 2025, 2:54pm

Quoting from 【编译系列】Torch.compile()训练编译——算子融合逻辑 & 工程化 - 知乎

PointWise: indicates point-by-point calculations, such as addition, subtraction, multiplication, and division, which only describes the calculations and the results are not saved in memory. The inner_fn describes the operations to be performed on each element in the Tensor (the index position of each element is passed in, and data is taken for calculation based on the index position and information such as shape and stride). For pure calculation intermediate results such as aten.floor, aten.ceil and addition, they will be saved as PointWise type;

InputBuffer: saves input variables, and there will also be corresponding memory allocation, and the name generally starts with arg; ComputedBuffer: saves intermediate calculation results or outputs, used for nodes such as reduce that involve communication and need to be used for output, and the name generally starts with buf;

Among them, InputBuffer and ComputeBuffer, which are actually stored nodes, will have stride and shape information. At the same time, after completing the lowering operation, GraphLoweing class will only save those Inductor IRs that have actual storage, which also means that PointWise will be fused, that is, this process is already doing simple fusion.

ComputedBuffers are created mostly via StorageBox.realize ()
Quoting from source code:

    If the IRNode refers to data which has not been materialized (e.g.,
    it is a Pointwise/Reduction that could potentially have more
    compute fused into it), realize the IRNode into physical memory,
    ending the possibility of fusing into it, but allowing, e.g., multiple
    users to access the data without having to recompute.

Check StorageBox.realize for a particularly notable implementation.

Hope this helps!