Feature embeddings for each user and item are identical

Hello,

I am working on a recommendation problem, and have implemented a graph neural network algorithm. For optimization, I have chosen The Bayesian Personalized Ranking method which essentially tries to maximize the difference between the users that have interacted with an item and users that have not.
So the new loss function looks like this:

loss = torch.sum(torch.mul(u, p - n)) 
loss = F.logsigmoid(loss)

u are the users. p and n are the positive (interacted) and the negative (did not interact) items.

In order to transform this task into a minimization problem, we have to compute the negative values of the loss and then use an optim method (e.g. gradient descent)

loss_sum += torch.neg(loss)

However, while training, the feature embeddings for each user and item become more similar after each epoch. in the end I have this:

tensor([[-1.6928e+07, -3.8487e+07, 2.8422e+05, …, -2.1248e+06,
8.2496e+04, -2.3038e+03],
[ 1.3135e+08, -3.8487e+07, 6.8238e+05, …, -9.7038e+06,
8.2496e+04, -5.4265e+03],
[ 2.1087e+07, -3.8487e+07, 3.8716e+05, …, -4.0844e+06,
8.2496e+04, -3.1112e+03],
…,
[-2.8774e+07, -3.8487e+07, 2.5270e+05, …, -1.5249e+06,
8.2496e+04, -2.0567e+03],
[-2.8796e+07, -3.8487e+07, 2.5270e+05, …, -1.5249e+06,
8.2496e+04, -2.0567e+03],
[-2.8796e+07, -3.8487e+07, 2.5270e+05, …, -1.5249e+06,
8.2496e+04, -2.0567e+03]], grad_fn=)

So here is what I think is happening. The reason that the embedding vectors are becoming so much alike is that the model is actually minimizing the loss function (or the difference between the positive and negative items), hence, we get an embedding matrix that has similar rows. However, my confusion is that when I have computed the negative of the loss the model should try to maximize the difference.

Here is my code:

for epoch in range(EPOCH):
  model.train()
  print("epochs: {}/{} ".format(epoch+1, EPOCH))
  t1 = time.time()
  for data in train_ldr:

      data.to(device)

      optimizer.zero_grad()
      xu, xa = model(data)
      print(xu, xa)

      # Sample_BPR

      loss_sum = 0
      for u_index, pos_index, neg_index in Sample_BPR_generator(train.edge_index, group_train, new_frame_train):
        
        u = xu[u_index]
        p = xa[pos_index]
        n = xa[neg_index]
        # BPR Loss
        loss = torch.sum(torch.mul(u, p - n)) 
        loss = F.logsigmoid(loss)
        loss_sum += torch.neg(loss)


      loss_sum.backward()
      optimizer.step()

Here is how the model is defined I used Pytorch Geometric to calculate the message passing in a graph convolution network.

class GCMCLayer(MessagePassing): 
  def __init__(self, in_channel, out_channel):
    super(GCMCLayer, self).__init__(aggr = 'add')
    self.lin = nn.Linear(in_channel, out_channel)


  def forward(self, x, edge_index, N, M):

   
    x_first ,x_second = x
    x_first = self.lin(x_first)

    row, col = edge_index
    
    deg_i = degree(col)
    deg_u = degree(row)
    deg_inv_i = deg_i.pow(-0.5)
    deg_inv_u = deg_u.pow(-0.5)
    norm = deg_inv_u[row] * deg_inv_i[col]

    return self.propagate(edge_index, x=(x_first, x_second), norm = norm, size = (N, M))

    def message(self, x_j, norm):
      return norm.view(-1, 1) * x_j
class DenseLayer(nn.Module):
  def __init__(self, in_channel1, out_channel):
    super(DenseLayer, self).__init__()
    self.lin1 = nn.Linear(in_channel1, out_channel, bias=True)
    self.lin2 = nn.Linear(out_channel, out_channel)
    self.lin3 = nn.Linear(out_channel, out_channel)
    self.batch = nn.BatchNorm1d(out_channel)

  def forward(self, x, hi):

    x = self.lin1(x)
    x = F.relu(x)
    hi = self.lin3(hi)
    x += hi
    return self.batch(F.relu(x))
class GC_encoder(nn.Module):
  def __init__(self, out_channel):
    super(GC_encoder, self).__init__()
    self.gconv_u = GCMCLayer(train.x_i.shape[1], out_channel)
    self.gconv_a = GCMCLayer(train.x_u.shape[1], out_channel)
    
    self.dense_u = DenseLayer(train.x_u.shape[1], out_channel)
    self.dense_a = DenseLayer(train.x_i.shape[1], out_channel)
  def forward(self, data):

    self.data = data

    xu , xa , edge_index = self.data.x_u, self.data.x_i, self.data.edge_index

    V = self.gconv_a((xu, xa), edge_index, N=xu.shape[0], M=xa.shape[0])
    xa_new = self.dense_a(xa, V)

    
    U = self.gconv_u((xa, xu), edge_index[torch.tensor([1, 0])], N=xa.shape[0], M=xu.shape[0])    
    xu_new = self.dense_u(xu, U)

  

    return (xu_new, xa_new)