Gradient descent with nested functions

espresso · October 29, 2022, 9:05pm

Edit: I spent a few minutes replacing the LaTeX with images but it was rejected because I am a new user. Sorry if it is difficult to read.

This is a code question where I am curious how to expand a simple example I am using to a more complex one using the autograd system in PyTorch. I am able to perform gradient descent with the following simple function I have written:

import torch

def gradient_descent(alpha, iterations, w, nll):
    print(f"Initializing at {w}.")
    
    w = torch.tensor(w, requires_grad=True)

    weights = [w.tolist()]
    costs = [nll(w).tolist()[0]]
    
    for i in range(iterations):
        nll_w = nll(w)
        nll_w.backward()

        with torch.no_grad():
            w -= (alpha * w.grad)
            w.grad.zero_()
        
        weights.append(w.tolist())
        costs.append(nll(w).tolist()[0])
        
    return weights, costs

You just define a cost function (in this case, negative log-likelihood) with an unknown quantity you wish to estimate and with the appropriate learning rate you can find a weight estimate given some sample data.

I am trying to do something similar for a cost function with nested variable and I am not sure how to do it. I have the following cost function (fictitious):

$C(\theta, a, b) = \ln(5) - a^2 - \sin(5) - b^3$

where $a = \frac{\theta^2-5}{3}$ and $b = \frac{\theta^3-6}{7}$. Thus my cost function actually has variables I wish to find minimum values for that themselves can be expanded into a fraction.

I want to find perform gradient descent where autograd determines the partial derivatives for $\frac{\partial C}{\partial \theta}$, $\frac{\partial C}{\partial a}$, and $\frac{\partial C}{\partial b}$ and performs gradient descent to determine the minima.

The issue is that if I define my cost function in terms of the three variables as shown above, then the cost function has no knowledge that $a$ and $b$ are actually fractions that contain $\theta$. But if I expand out my cost function, replacing $a$ and $b$ with their actual values, then the cost function is no longer using the inputs $a$ and $b$ anywhere and the gradients cannot be calculated.

So my question is about how to modify my code to achieve this.