Training loss does not change at all

I am working on an NLP task and I’m trying to train my model to correctly predict the relevance of two documents. Input is a pair of document representations and the model applies linear transformation to both, computes the cosine similarity, applies sigmoid. It turns out, the loss doesn’t change at all during the training.

Below is my model

class Relevance_Binary(nn.Module):
    def __init__(self):
        self.linear_art = nn.Linear(512,1)
        self.linear_q = nn.Linear(256,1)
        self.sigmoid = nn.Sigmoid()
        self.cosine_similarity = F.cosine_similarity
    def forward(self, q, art):
        q_linear = self.linear_q(torch.transpose(q,0,1))
        art_linear = self.linear_art(torch.transpose(art,0,1))      
        sim = self.cosine_similarity(q_linear, art_linear)
        sim = self.sigmoid(sim)
        return sim

Below is my train loop

def train_loop(dataloader, model, optimizer):
    Loss = list()
    for batch in dataloader:
        a = batch[0].to(device)#number of labels. Usually between 1 to 3
        q = batch[1].to(device)#[BATCH_SIZE, 256, 768]
        for item in articles:
            art = articles[item].to(device)
            art = art.expand(BATCH_SIZE,512,768)
            sim = model(q, art)
            loss = torch.tensor([-sim[i] * pos_weight if item in a else sim[i] for i in range(BATCH_SIZE)], requires_grad=True)
            loss = 0
            for i in range(BATCH_SIZE):
                if item in a[i]:
                    loss += -sim[i] * pos_weight
                    loss += sim[i]

I call the training loop the following way:

pos_weight = 580488 // 763
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
for epoch in range(50):
    print('starting epoch {}'.format(epoch))
    train_loop(train_dataloader, model, optimizer)

I suspected that the way I compute my loss might be the cause of the problem:
loss = torch.tensor([-sim[i] * pos_weight if item in a else sim[i] for i in range(BATCH_SIZE)])
so I tried required_grad=True, or using criterion = nn.BCEWithLogitsLoss(weight=weight) to compute the loss but they didn’t work. Could you point out what might be the problem?
I would greatly appreciate any comment. Thank you!

Yes, you are detaching the loss tensor from the computation graph by recreating a new tensor, so you would have to create the loss tensor by using, torch.stack, torch.where etc. which are all differentiable.

Using solved the problem. Thank you very much!