Custom loss function problems

Mike_Deng · June 23, 2021, 9:36am

I am currently working on a project which uses LSTM to segment time series.
I am treating it as a classification problem so first I convert the labels of segmenting points to labels of classes(1 and 0)
However, I want to also use labels of segmenting points in my loss, so I will need to convert the result of my model back to the original format and then compute loss.
My question is that if I do the conversion on CPU and compute the loss between the result and the label, isn’t that loss not connected to model parameters and thus has no effect?
What is the correct way to implement this? Any help will be greatly appreciated.

ptrblck · June 24, 2021, 3:15am

If I understand the question correctly, you are concerned about the “conversion” of the model output and the loss computation. As long as you are not using a non-differentiable operation in this “conversion” and the new output is thus still attached to the computation graph, Autograd will be able to compute the gradients of all parameters, which were used to create the original output.

Mike_Deng · June 26, 2021, 7:23am

Thanks for the answer, I think my custom loss operation is not differentiable, but wouldn’t that throw errors? Currently, my code runs without any errors.

for index, item in enumerate(output):      
            temp = item.max(dim=1)[1]
            temp = temp.cpu().numpy().squeeze()
            prev = np.nan
            change = []
            for index2, item2 in enumerate(temp):
                if index2 == 0:
                    prev = item2
                if prev == 1 and item2 == 0:
                    change.append(index2)
                prev = item2
                
            change_np = np.array(change)
            label2_np = np.array(label2[index])
            if len(change)>len(label2[index]):
                #padd label
                tt = len(change)-len(label2[index])
                label2_np = np.concatenate((label2_np, np.zeros(tt)))
            else:
                #padd output
                tt2 = len(label2[index])-len(change)
                change_np = np.concatenate((change_np, np.zeros(tt2)))
            
            loss_temp = sm_l1_loss(torch.Tensor(change_np), torch.Tensor(label2_np))
            loss_arr.append(loss_temp)

Here is what I did. Thank you so much again!

ptrblck · June 26, 2021, 8:36pm

Are you sure the backward operation on loss_temp isn’t raising an error?
Since you are currently using numpy operations and are only wrapping the results to tensors, no computation graph would be created.
Could you post an executable code snippet showing the backward call as well?

Mike_Deng · June 27, 2021, 1:06am

 loss = cre_loss((output.reshape(-1,2).squeeze()), label.squeeze().reshape(-1).cuda())
        for i in loss_arr:
            loss += i
loss.backward()

Here is the backward() call, this part comes right after the part I posted above. I basically combined all losses(including the “regular” loss between the output and the label)
Is there a way to correctly implement this? I can’t really think of a way to implement this with differentiable operations. Thanks!

ptrblck · June 27, 2021, 8:26pm

I assume cre_loss gives a proper loss tensors with a valid grad_fn, while loss_arr contains the loss using numpy operations. If that’s the case, you would add constants to loss, which won’t change the gradients.

The first step would be to replace all numpy operations with PyTorch ones. If some operations are not differentiable, you could think about an approximation function and implement this instead.