How may one detach only a certain variable in the computational graph?

I have a weird objective function I want to optimize for research reasons. It is given by a sum of neural network derivatives with respect to the input. For simplicity lets consider only two one dimensional inputs: x and y. Let’s also take the model as having been defined as a NN. Then I can obtain the derivatives with respect to the inputs as follows:

y = torch.tensor([4], dtype=torch.float32, requires_grad=True)

x = torch.tensor([1], dtype=torch.float32, requires_grad=True)


dx = torch.autograd.grad(model(x),x, create_graph=True)[0]

dy = torch.autograd.grad(model(y),y, create_graph=True)[0]

Let’s say our objective function in this case is:

obj = dx+ dy

We would like to now optimize this objective function with respect to the parameters of our model (the NN) .

This can be done by taking gradient descent optimization steps with respect to the model parameters ( in the following code we will showcase only 1 step).

optimizer = torch.optim.SGD(model.parameters(), lr=0.1 )

optimizer.zero_grad()

obj.backward()

optimizer.step()

The problem I am facing occurs when trying to perform batch Stochastic gradient descent where the batch size refers to the number of elements in the summation of our objective function (in the example we are following there are only 2 elements: dx and dy, so one can choose batch_size= 1 to select only one element of the summation with respect to which we wish to calculate the gradient to put in our optimization step) . My naive implementation runs into a problem

batch_size = 1
num_epochs = 100

for epoch in range(num_epochs):
    for i in range(0, 2, batch_size):
        
        optimizer.zero_grad()
        
        batch = sometensors[i:i+batch_size]
       
        sum_of_tensors = torch.stack(batch).sum(dim=0)

        sum_of_tensors.backward()
        
        
        optimizer.step()

After the first loop I am hit with an error ‘Trying to backward through the graph a second time’ this I have come to think is because when I call .backward() the whole computational graph is being computed, even with respect to the inputs since they are a part of the computational graph so setting optimizer.zero_grad() doesn’t help. So, after having computed the derivatives with respect to the inputs, is there a way to detach ONLY the inputs from the computational graph (keeping the parameters)? I guess that may solve the problem of ‘Trying to backward through the graph a second time’, other solutions or ideas to solve my problem are more than welcome.

Thanks in advance.

If you’re trying to do something second-order, create_graph=True may be what you want?

Thank you, unfortunately, I have already tried that and at the second iteration of the loop I simply get a different error when calling .backward() ’ one of the variables needed for gradient computation has been modified by an inplace operation’ . I suspect this is again connected to the fact that backward() computes the whole computational graph.

You can specify inputs= arguments to backward (just like you do for to indicate what .grad) if you want to compute gradients wrt specific tensors.

The graph is considered immutable once you have built it, so you you shouldn’t think about detaching it after the fact.

In case you still need to get around the 'one of the variables needed for gradient computation has been modified in place" you can use the
Automatic differentiation package - torch.autograd — PyTorch main documentation API