Custom loss function problems

I am currently working on a project which uses LSTM to segment time series.
I am treating it as a classification problem so first I convert the labels of segmenting points to labels of classes(1 and 0)
However, I want to also use labels of segmenting points in my loss, so I will need to convert the result of my model back to the original format and then compute loss.
My question is that if I do the conversion on CPU and compute the loss between the result and the label, isn’t that loss not connected to model parameters and thus has no effect?
What is the correct way to implement this? Any help will be greatly appreciated.

If I understand the question correctly, you are concerned about the “conversion” of the model output and the loss computation. As long as you are not using a non-differentiable operation in this “conversion” and the new output is thus still attached to the computation graph, Autograd will be able to compute the gradients of all parameters, which were used to create the original output.

Thanks for the answer, I think my custom loss operation is not differentiable, but wouldn’t that throw errors? Currently, my code runs without any errors.

for index, item in enumerate(output):      
            temp = item.max(dim=1)[1]
            temp = temp.cpu().numpy().squeeze()
            prev = np.nan
            change = []
            for index2, item2 in enumerate(temp):
                if index2 == 0:
                    prev = item2
                if prev == 1 and item2 == 0:
                    change.append(index2)
                prev = item2
                
            change_np = np.array(change)
            label2_np = np.array(label2[index])
            if len(change)>len(label2[index]):
                #padd label
                tt = len(change)-len(label2[index])
                label2_np = np.concatenate((label2_np, np.zeros(tt)))
            else:
                #padd output
                tt2 = len(label2[index])-len(change)
                change_np = np.concatenate((change_np, np.zeros(tt2)))
            
            loss_temp = sm_l1_loss(torch.Tensor(change_np), torch.Tensor(label2_np))
            loss_arr.append(loss_temp)

Here is what I did. Thank you so much again!

Are you sure the backward operation on loss_temp isn’t raising an error?
Since you are currently using numpy operations and are only wrapping the results to tensors, no computation graph would be created.
Could you post an executable code snippet showing the backward call as well?

 loss = cre_loss((output.reshape(-1,2).squeeze()), label.squeeze().reshape(-1).cuda())
        for i in loss_arr:
            loss += i
loss.backward()

Here is the backward() call, this part comes right after the part I posted above. I basically combined all losses(including the “regular” loss between the output and the label)
Is there a way to correctly implement this? I can’t really think of a way to implement this with differentiable operations. Thanks!

I assume cre_loss gives a proper loss tensors with a valid grad_fn, while loss_arr contains the loss using numpy operations. If that’s the case, you would add constants to loss, which won’t change the gradients.

The first step would be to replace all numpy operations with PyTorch ones. If some operations are not differentiable, you could think about an approximation function and implement this instead.