If i have a pytorch module M with a constant matrix buffer T, where M.forward(x) = x @ T, If i apply this module M at several points in a deep model, will autograd save one copy of the (constant) matrix T per invocation, or will it “realize” that the matrix T is constant and just have one copy?
T is registered as a buffer, it will be stored in memory once.
The output created by e.g.
x @ self.T will of course be a new tensor.
Could you explain your question a bit more, as I think I might have misunderstood the problem?
So given a deep model
f=(f_3 . M . f_2 . M. f_1), when we evaluate
M is applied twice. Once to
f_1(x) and once to
f_2(M(f_1(x))), the output at each point will of course be a new tensor.
T is not constant, and if we needed to compute the gradient w.r.t. T and the input to M, we would need to save the both T and the inputs to M in the forward pass to be able to compute the backward pass. But since T is constant, and we only need to compute the gradient w.r.t. x, the backward pass of M is itself constant (it’s T).
The question is whether I can safely assume that autograds computation graph is constructed in such a way that it does not store unnecessary copies of T. (This might very well be a naive question, with an obvious answer, but I’m not super knowledgeable about the inner workings of autograd )
Yes, Autograd should figure this out.
Here is a small example, which shows that leaf variables are treated in a special way. I.e. inplace modifications are now allowed, if the value of the variable is needed to calculate the gradient:
# Both require gradients x = torch.randn(1, requires_grad=True) y = torch.randn(1, requires_grad=True) out = x * y out.backward() # Modify inplace x = torch.randn(1, requires_grad=True) y = torch.randn(1, requires_grad=True) y = 2 # error out = x * y out.backward() > RuntimeError: leaf variable has been moved into the graph interior # Modify inplace of constant x = torch.randn(1, requires_grad=True) y = torch.randn(1, requires_grad=False) y = 2 # works out = x * y out.backward()
While the inplace modification works for “constant” tensors.
I see. And the following also fails:
x = torch.randn(1, requires_grad=True) y = torch.randn(1, requires_grad=False) y = 2 # Works l1 = x * y y = 3 # Does not work l2 = l1 * y out.backward()
Which makes sense.
Thanks for the reply!