I am working on an NLP task and I’m trying to train my model to correctly predict the relevance of two documents. Input is a pair of document representations and the model applies linear transformation to both, computes the cosine similarity, applies sigmoid. It turns out, the loss doesn’t change at all during the training.
Below is my model
class Relevance_Binary(nn.Module):
def __init__(self):
super().__init__()
self.linear_art = nn.Linear(512,1)
self.linear_q = nn.Linear(256,1)
self.sigmoid = nn.Sigmoid()
self.cosine_similarity = F.cosine_similarity
def forward(self, q, art):
q_linear = self.linear_q(torch.transpose(q,0,1))
art_linear = self.linear_art(torch.transpose(art,0,1))
sim = self.cosine_similarity(q_linear, art_linear)
sim = self.sigmoid(sim)
return sim
Below is my train loop
def train_loop(dataloader, model, optimizer):
Loss = list()
model.train()
for batch in dataloader:
a = batch[0].to(device)#number of labels. Usually between 1 to 3
q = batch[1].to(device)#[BATCH_SIZE, 256, 768]
for item in articles:
art = articles[item].to(device)
art = art.expand(BATCH_SIZE,512,768)
sim = model(q, art)
loss = torch.tensor([-sim[i] * pos_weight if item in a else sim[i] for i in range(BATCH_SIZE)], requires_grad=True)
loss = 0
for i in range(BATCH_SIZE):
if item in a[i]:
loss += -sim[i] * pos_weight
else:
loss += sim[i]
loss.backward()
Loss.append(loss)
I call the training loop the following way:
pos_weight = 580488 // 763
BATCH_SIZE = 4
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
for epoch in range(50):
print('starting epoch {}'.format(epoch))
train_loop(train_dataloader, model, optimizer)
I suspected that the way I compute my loss might be the cause of the problem:
loss = torch.tensor([-sim[i] * pos_weight if item in a else sim[i] for i in range(BATCH_SIZE)])
so I tried required_grad=True, or using criterion = nn.BCEWithLogitsLoss(weight=weight) to compute the loss but they didn’t work. Could you point out what might be the problem?
I would greatly appreciate any comment. Thank you!