Is it possible to predict the memory required to store the graph needed if I call torch.autograd.grad
with create_graph=True?
Here is a concrete example. Suppose I have a simple neural net with some layer sizes 100 x 10 x 1 so my feature dimension is 100 and I output a scalar. I want to add the gradients of the output with respect to my inputs to my loss. Let’s further suppose I have 50 examples of features so my input is of shape (50, 100), let’s call this x. I then compute
x = torch.rand(50, 100).to(device).requires_grad_(True)
predictions = model(x)
gradients = torch.autograd.grad(predictions.sum(), x, create_graph=True, retain_graph=True)
How much memory does each call need?