Non-linear equation doesn't learn with Custom Loss (Continous Ranked Probability Score - CRPS)

Hello there,

I want to train a simple non-linear equation with 4 parameters with a probabilistic loss function, the CRPS. The model predicts a distribution of outputs, that I create simply by repeatedly integating it with different initial values.
If I write a very simple custom loss function, such as the mean absolute error, the pipeline is working and the weights (the 4 parameter values) are changing.
But with the CRPS, I have to create the cdf of the predictions and the observation in a very long loss function with lots of operations (see below). Using this functions, my parameter values don’t change in the training. What am I doing wrong?

I appreciate any help, thank you very much!

def my_custom_loss(outputs, targets):

    """
    outputs: tensor. vector of length n, containing ensemble predictions
    targets: tensor. scalar, according observation.
    """

    fc = torch.sort(outputs).values
    ob = targets.clone()
    m = len(fc)

    cdf_fc = []
    cdf_ob = []
    delta_fc = []
    # do for all ensemble members
    for f in range(len(fc) - 1):
        # check is ensemble member and its following ensemble member is smaller than observation.
        if (fc[f] < ob) and (fc[f + 1] < ob):
            cdf_fc.append((f + 1) * 1 / m)
            cdf_ob.append(0)
            delta_fc.append(fc[f + 1] - fc[f])
        elif (fc[f] < ob) and (fc[f + 1] > ob):
            # check is ensemble member is smaller than observation and its following ensemble member is larger than observation.
            cdf_fc.append((f + 1) * 1 / m)
            cdf_fc.append((f + 1) * 1 / m)
            cdf_ob.append(0)
            cdf_ob.append(1)
            delta_fc.append(ob - fc[f])
            delta_fc.append(fc[f + 1] - ob)
        else:
            cdf_fc.append((f + 1) * 1 / m)
            cdf_ob.append(1)
            delta_fc.append(fc[f + 1] - fc[f])
    cdf_fc = torch.tensor(cdf_fc, dtype=torch.float, requires_grad=True)
    cdf_ob = torch.tensor(cdf_ob, dtype=torch.float, requires_grad=True)
    delta_fc = torch.tensor(delta_fc, dtype=torch.float, requires_grad=True)

    loss = torch.sum(((cdf_fc - cdf_ob) ** 2) * delta_fc)

    return loss

You are detaching the tensors from the computation graph by recreating new leaf tensors in:

    cdf_fc = torch.tensor(cdf_fc, dtype=torch.float, requires_grad=True)
    cdf_ob = torch.tensor(cdf_ob, dtype=torch.float, requires_grad=True)
    delta_fc = torch.tensor(delta_fc, dtype=torch.float, requires_grad=True)

Use torch.cat or torch.stack instead, which is differentiable.

Thanks a lot for the hint, @ptrblck .
As the entries in cdf_fc, cdf_ob and delta_fc are floats I can’t stack them, so I created tensors without gradients that I fill instead of lists before the loop, as

    cdf_fc = torch.zeros_like(fc)
    cdf_ob = torch.zeros_like(fc)
    delta_fc = torch.zeros_like(fc)

Now the model learns something - with very bad results but that might be a different problem.
So, does the custum loss I return not need requires_grad = True or will I have to set the flag before return?

The input tensor(s) should already require gradients and you should thus not rewrap them into new tensors. Generally, you would create tensors with requires_grad=True only for the model inputs (if the inputs themselves require gradients, which is not the common use case) and for trainable parameters if you want to create these manually (these will be created via nn.Parameter in the nn.Modules for you).

The input tensors to the loss function do require gradients. But in creating cumulative distribution functions in the loss function, I create new tensors from the position of the entries of these input tensors: I order them with torch sort and then use conditional indices of the values.