Does autograd save copies of constant buffers?

apsod · May 6, 2020, 12:09pm

If i have a pytorch module M with a constant matrix buffer T, where M.forward(x) = x @ T, If i apply this module M at several points in a deep model, will autograd save one copy of the (constant) matrix T per invocation, or will it “realize” that the matrix T is constant and just have one copy?

ptrblck · May 7, 2020, 4:48am

If T is registered as a buffer, it will be stored in memory once.
The output created by e.g. x @ self.T will of course be a new tensor.
Could you explain your question a bit more, as I think I might have misunderstood the problem?

apsod · May 7, 2020, 7:57am

So given a deep model f=(f_3 . M . f_2 . M. f_1), when we evaluate f(x), M is applied twice. Once to f_1(x) and once to f_2(M(f_1(x))), the output at each point will of course be a new tensor.
If T is not constant, and if we needed to compute the gradient w.r.t. T and the input to M, we would need to save the both T and the inputs to M in the forward pass to be able to compute the backward pass. But since T is constant, and we only need to compute the gradient w.r.t. x, the backward pass of M is itself constant (it’s T).
The question is whether I can safely assume that autograds computation graph is constructed in such a way that it does not store unnecessary copies of T. (This might very well be a naive question, with an obvious answer, but I’m not super knowledgeable about the inner workings of autograd )

ptrblck · May 7, 2020, 8:30am

Yes, Autograd should figure this out.
Here is a small example, which shows that leaf variables are treated in a special way. I.e. inplace modifications are now allowed, if the value of the variable is needed to calculate the gradient:

# Both require gradients
x = torch.randn(1, requires_grad=True)
y = torch.randn(1, requires_grad=True)

out = x * y
out.backward()

# Modify inplace
x = torch.randn(1, requires_grad=True)
y = torch.randn(1, requires_grad=True)

y[0] = 2 # error
out = x * y
out.backward()
> RuntimeError: leaf variable has been moved into the graph interior

# Modify inplace of constant
x = torch.randn(1, requires_grad=True)
y = torch.randn(1, requires_grad=False)

y[0] = 2 # works
out = x * y
out.backward()

While the inplace modification works for “constant” tensors.

apsod · May 7, 2020, 9:50am

I see. And the following also fails:

x = torch.randn(1, requires_grad=True)
y = torch.randn(1, requires_grad=False)

y[0] = 2 # Works
l1 = x * y
y[0] = 3 # Does not work
l2 = l1 * y
out.backward()

Which makes sense.
Thanks for the reply!