I am training a logistic regression model, with customised loss as below:

```
def loss_calc(self, predictions, labels, instance_predictions, similarities_btw_instances):
"""
calculates loss
:param predictions: sigmoid outputs for all groups from the training data.
:param labels: ground truth, 0 indicating a negative news item, and 1 a positive one
:param instance_predictions: sigmoid outputs for all individual sentences from the training data.
:param similarities_btw_instances: similarities between sentences' vector representations, using rbf kernel
:return: calculated loss
"""
N = len(instance_predictions)
K = 14.7 # average size of groups
diff_btw_predictions = torch.cartesian_prod(instance_predictions.view(-1), instance_predictions.view(-1))
squared_diff = map(lambda x: (x[0] - x[1]) ** 2, diff_btw_predictions)
squared_diff = list(squared_diff)
squared_diff = torch.reshape(torch.Tensor(squared_diff), (len(instance_predictions), len(instance_predictions)))
squared_diff.requires_grad = True
first_term = torch.mul(similarities_btw_instances, squared_diff)
first_term_loss = torch.sum(first_term) # requires grad = True
second_term_temp = []
for pred, label in zip_longest(predictions, labels):
try:
second_term_temp.append((pred - label)**2)
second_term = torch.cat(second_term_temp)
except TypeError:
pass
second_term_loss = torch.sum(second_term)
first_loss = 1/pow(N, 2) * first_term_loss
second_loss = self.trade_off/K * second_term_loss
loss = first_loss.add(second_loss)
return loss
```

and train with SGD with momentum:

`optimizer = torch.optim.SGD(model.parameters(), lr=self.lr, momentum=self.momentum)`

with these params: lr=0.05, num_iter=50, momentum=0.8

However, the loss is at around 1,5 and does not decrease. Can anyone help me figure out if there is something wrong with my implementation, as I am new to pytorch?!

p.s. this is the proposed loss function I want to implement:

39|441x500