Getting a zero grad in non-linear regression

Hello everybody! I am working on a code that tries to fit a theoretical curve to a set of observational data points. This could be easily achieved through least squares, but I have noticed that a very good and sophisticated way of doing it is through an algorithm of non-linear regression in PyTorch. The problem is that I don’t know any example code like this. The idea came to my mind when I was watching a video about linear regression in PyTorch, but this is the only way of doing a regression I know. What I want to do is to get the best free parameters so the theoretical curve is the best option to model the data.

Well, let’s explain the code. The class that defines the model is

class Mass_and_v(nn.Module):
    def __init__(self, param):
        super(Mass_and_v, self).__init__()
        self.parameter = nn.Parameter(data=param, requires_grad=True)
    # get v_circ_tot
    def __call__(self, x):
        r, mass_rar, _ = rar.model(self.parameter[3:], gradient_tracking=True)
        r_max = r[-1]
        mass_max = torch.max(mass_rar)
        x_spline = x[x <= r_max]
        mass_spline = chs.interp(r, mass_rar, x_spline)
        v_circ = torch.where(x <= r_max,
                             torch.sqrt(v_circ_b(x, self.parameter[0])**2 + v_circ_MN(x, self.parameter[1:3])**2 + G_u*mass_spline/x),
                             torch.sqrt(v_circ_b(x, self.parameter[0])**2 + v_circ_MN(x, self.parameter[1:3])**2 + G_u*mass_max/x))
        return v_circ

class FitModel(nn.Module):
    def __init__(self):
        super(FitModel, self).__init__()
        self.nonlinear_stack = Mass_and_v(param_t)
    def forward(self, x):
        output = self.nonlinear_stack(x)
        return output

Then, I instantiate the model object through

model = FitModel()

The external functions that I use are rar.model, v_circ_b and v_circ_MN, all of which are correctly coded using torch (let me know if you need to see the code of these functions).

So, when I run

for i in range(n_epoch):
    pred = model(inputs)
    # The loss function is a nn.MSELoss()
    loss_fn = loss(pred, targets)
    # We apply the backpropagation method using a torch.optim.Adam optimizer
    """ The next step is to backpropagate this error through the network. Backward
    propagation is kicked off when we call .backward() on the error tensor. Autograd
    then calculates and stores the gradients for each model parameter in the parameter’s 
    .grad attribute."""
    """ We call .step() to initiate Adam. The optimizer adjusts each parameter by
    its gradient stored in .grad."""

I get a null grad for the self.parameter[4:] parameters used inside rar.model. Do you know what could be the problem?

Check, if any of the internal functions is (accidentally) detaching a tensor from the computation graph (e.g. by using a non-differentiable operation) by checking the .grad_fn of intermediate tensors.
If one of them shows None it would mean that this particular tensor is not attached to a graph. Often users try to fix this by rewrapping a tensor and setting requires_grad=True again, which won’t fix the issue.

Hi @ptrblck,
Thank you for your answer! I will check that.