I have a weird objective function I want to optimize for research reasons. It is given by a sum of neural network derivatives with respect to the input. For simplicity lets consider only two one dimensional inputs: x and y. Let’s also take the model as having been defined as a NN. Then I can obtain the derivatives with respect to the inputs as follows:
y = torch.tensor(, dtype=torch.float32, requires_grad=True) x = torch.tensor(, dtype=torch.float32, requires_grad=True) dx = torch.autograd.grad(model(x),x, create_graph=True) dy = torch.autograd.grad(model(y),y, create_graph=True)
Let’s say our objective function in this case is:
obj = dx+ dy
We would like to now optimize this objective function with respect to the parameters of our model (the NN) .
This can be done by taking gradient descent optimization steps with respect to the model parameters ( in the following code we will showcase only 1 step).
optimizer = torch.optim.SGD(model.parameters(), lr=0.1 ) optimizer.zero_grad() obj.backward() optimizer.step()
The problem I am facing occurs when trying to perform batch Stochastic gradient descent where the batch size refers to the number of elements in the summation of our objective function (in the example we are following there are only 2 elements: dx and dy, so one can choose batch_size= 1 to select only one element of the summation with respect to which we wish to calculate the gradient to put in our optimization step) . My naive implementation runs into a problem
batch_size = 1 num_epochs = 100 for epoch in range(num_epochs): for i in range(0, 2, batch_size): optimizer.zero_grad() batch = sometensors[i:i+batch_size] sum_of_tensors = torch.stack(batch).sum(dim=0) sum_of_tensors.backward() optimizer.step()
After the first loop I am hit with an error ‘Trying to backward through the graph a second time’ this I have come to think is because when I call .backward() the whole computational graph is being computed, even with respect to the inputs since they are a part of the computational graph so setting optimizer.zero_grad() doesn’t help. So, after having computed the derivatives with respect to the inputs, is there a way to detach ONLY the inputs from the computational graph (keeping the parameters)? I guess that may solve the problem of ‘Trying to backward through the graph a second time’, other solutions or ideas to solve my problem are more than welcome.
Thanks in advance.