Loss is not decreasing for custom loss function

I am training a logistic regression model, with customised loss as below:

def loss_calc(self, predictions, labels, instance_predictions, similarities_btw_instances):
        """
        calculates loss
        :param predictions: sigmoid outputs for all groups from the training data.
        :param labels: ground truth, 0 indicating a negative news item, and 1 a positive one
        :param instance_predictions: sigmoid outputs for all individual sentences from the training data.
        :param similarities_btw_instances: similarities between sentences' vector representations, using rbf kernel
        :return: calculated loss
        """
        N = len(instance_predictions)
        K = 14.7 # average size of groups

        diff_btw_predictions = torch.cartesian_prod(instance_predictions.view(-1), instance_predictions.view(-1))

        squared_diff = map(lambda x: (x[0] - x[1]) ** 2, diff_btw_predictions)
        squared_diff = list(squared_diff)
        squared_diff = torch.reshape(torch.Tensor(squared_diff), (len(instance_predictions), len(instance_predictions)))
        squared_diff.requires_grad = True
        first_term = torch.mul(similarities_btw_instances, squared_diff)
        first_term_loss = torch.sum(first_term)  # requires grad = True

        second_term_temp = []
        for pred, label in zip_longest(predictions, labels):
            try:
                second_term_temp.append((pred - label)**2)
                second_term = torch.cat(second_term_temp)
            except TypeError:
                pass
        second_term_loss = torch.sum(second_term)
        first_loss = 1/pow(N, 2) * first_term_loss
        second_loss = self.trade_off/K * second_term_loss
        loss = first_loss.add(second_loss)
        return loss

and train with SGD with momentum:
optimizer = torch.optim.SGD(model.parameters(), lr=self.lr, momentum=self.momentum)
with these params: lr=0.05, num_iter=50, momentum=0.8

However, the loss is at around 1,5 and does not decrease. Can anyone help me figure out if there is something wrong with my implementation, as I am new to pytorch?!

p.s. this is the proposed loss function I want to implement:
39|441x500

You are breaking the computation graph, when you use list() and/or torch.Tensor().

1 Like
        N = len(instance_predictions)
        K = 14.7 # average size of groups

        instance_predictions_matrix = \
            instance_predictions.reshape(len(instance_predictions), -1).repeat(
                1, len(instance_predictions)).view(len(instance_predictions), len(instance_predictions))
        squared_diff_btw_predictions = (instance_predictions - instance_predictions_matrix)**2

        first_term = torch.mul(similarities_btw_instances, squared_diff_btw_predictions)
        first_term_loss = torch.sum(first_term)  # requires grad = True


        second_term = (predictions - labels)**2

        second_term_loss = torch.sum(second_term)

        first_loss = 1/(N**2) * first_term_loss
        second_loss = self.trade_off/K * second_term_loss

        loss = first_loss.add(second_loss)
        
        return loss

Here’s my updated loss function, do you think this still breaks computation graph? Or how can I check which operations are breaking computation graph?

I am not sure about the correctness of the loss functions. But as far as I can see, the operations are not breaking the computational graph. If you are not creating new tensors using the output of the network or going out of torch eco-system (.numpy()), you won’t be breaking the computational graph.