Gradient becomes None while using Autograd

mukh94 · November 2, 2019, 1:42am

I am trying to implement Newton’s method using the following code:

# initial guess
guess = torch.tensor([1,1], dtype=torch.float64, requires_grad = True) 

# function to optimize
def my_func(x):
    alpha=Variable(x[0], requires_grad=True)
    beta=Variable(x[1], requires_grad=True)
    K=(torch.tensor([[0., beta, 0.], 
                        [beta, alpha,beta], 
                        [0., beta, 0]], requires_grad=True))
    c=torch.ones((3,1),dtype=float)
    return torch.mm(torch.mm(torch.t(c),K),c) # random function to optimize
def gradient_hessian(J, params):
        d = torch.autograd.grad(J, params, create_graph=True)
        d2 = [torch.autograd.grad(f, params, retain_graph=(i < len(d)-1)) for i,f in enumerate(d)]
        return torch.tensor(d), torch.tensor(d2)

def newton(func, guess, runs=10): 
    for _ in range(runs): 
        gamma=1
        # evaluate our function with current value of `guess`
        value = Variable(my_func(guess),requires_grad=True)
        d,d2=gradient_hessian(value,guess)
        guess.data-=gamma*torch.mm(torch.inverse(d2),d)
        print(guess.data)
        
    return guess.data # return our final `guess` after 5 updates

# call starts
result = newton(my_func,guess)
print(result)

This produces the following error:
RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

When I try to make allow_unused = True, the gradient becomes none. Where am I losing the gradient? Thanks!

ptrblck · November 2, 2019, 8:34am

By wrapping tensors (which already require gradients) in a new tensor, you are detaching the new tensor from the computation graph.

E.g. my_func should return a valid tensor, which shouldn’t be wrapped again.

Also, Variables are deprecated, so you can just use tensors and set their requires_grad attribute to True.

mukh94 · November 2, 2019, 1:06pm

Thanks much for the reply! I understand that forming the tensor K might have caused the issue. However my cost function involves matrix operations (such as the one I have mentioned, ie., (c_transpose * K * c), where K is composed of the 2 element parameter I want to optimize). Any suggestion how to circumvent that? TIA!

ptrblck · November 3, 2019, 2:58am

You could most likely create K by using torch.cat and torch.stack calls to avoid creating a completely new tensor.
Let me know, if you get stuck.