If i have a pytorch module **M** with a constant matrix buffer T, where M.forward(x) = x @ T, If i apply this module **M** at several points in a deep model, will autograd save one copy of the (constant) matrix T per invocation, or will it “realize” that the matrix T is constant and just have one copy?

If `T`

is registered as a buffer, it will be stored in memory once.

The output created by e.g. `x @ self.T`

will of course be a new tensor.

Could you explain your question a bit more, as I think I might have misunderstood the problem?

So given a deep model `f=(f_3 . M . f_2 . M. f_1)`

, when we evaluate `f(x)`

, `M`

is applied twice. Once to `f_1(x)`

and once to `f_2(M(f_1(x)))`

, the output at each point will of course be a new tensor.

If `T`

is not constant, and if we needed to compute the gradient w.r.t. T and the input to M, we would need to save the both T and the inputs to M in the forward pass to be able to compute the backward pass. But since T is constant, and we only need to compute the gradient w.r.t. x, the backward pass of M is itself constant (it’s T).

The question is whether I can safely assume that autograds computation graph is constructed in such a way that it does not store unnecessary copies of T. (This might very well be a naive question, with an obvious answer, but I’m not super knowledgeable about the inner workings of autograd )

Yes, Autograd should figure this out.

Here is a small example, which shows that leaf variables are treated in a special way. I.e. inplace modifications are now allowed, if the value of the variable is needed to calculate the gradient:

```
# Both require gradients
x = torch.randn(1, requires_grad=True)
y = torch.randn(1, requires_grad=True)
out = x * y
out.backward()
# Modify inplace
x = torch.randn(1, requires_grad=True)
y = torch.randn(1, requires_grad=True)
y[0] = 2 # error
out = x * y
out.backward()
> RuntimeError: leaf variable has been moved into the graph interior
# Modify inplace of constant
x = torch.randn(1, requires_grad=True)
y = torch.randn(1, requires_grad=False)
y[0] = 2 # works
out = x * y
out.backward()
```

While the inplace modification works for “constant” tensors.

I see. And the following also fails:

```
x = torch.randn(1, requires_grad=True)
y = torch.randn(1, requires_grad=False)
y[0] = 2 # Works
l1 = x * y
y[0] = 3 # Does not work
l2 = l1 * y
out.backward()
```

Which makes sense.

Thanks for the reply!