Hello, I want to implement such mini-batch gradient descent step below:
where d(h+l,t) is the distance of two vector h+l and t. y is the margin and [·]+ is the maximun of 0 and the value inside it.
Now I have two implementations of this and they behave differently. But I don’t know which one is correct for this algorithm above.
- Version 1 : Get a list of losses of mini-batch samples, and perform .backward() on loss.mean()
class Model():
...
def forward(self, p_h, p_l, p_t, n_h, n_l, n_t):
dis1 = (self.entity_em(p_h) + self.relation_em(p_l) - self.entity_emb(p_t)).norm(p=2, dim=1)
dis2 = (self.entity_em(n_h) + self.relation_em(n_l) - self.entity_emb(n_t)).norm(p=2, dim=1)
return self.loss(dis1, dis2)
def loss(self, dis1, dis2):
target = torch.tensor([-1])
criterion = nn.MarginRankingLoss(margin=gamma, reduction='none')
return criterion(dis1, dis2, target)
...
model = Model()
for batch_index in ... : # mini-batch loop
optimizer.zero_grad()
loss = model(p_h, p_l, p_t, n_h, n_l, n_t)
total_loss += loss.sum().item()
loss.mean().backward()
optimizer.step()
- Version 2 : Get the sum of losses of mini-batch samples and perform .backward() on loss
class Model():
...
def forward(self, p_h, p_l, p_t, n_h, n_l, n_t):
dis1 = (self.entity_em(p_h) + self.relation_em(p_l) - self.entity_emb(p_t)).norm(p=2, dim=1)
dis2 = (self.entity_em(n_h) + self.relation_em(n_l) - self.entity_emb(n_t)).norm(p=2, dim=1)
dis_diff = self.gamma + dis1 - dis2
result = torch.sum(F.relu(distance_diff))
return result
...
model = Model()
for batch_index in ... : # mini-batch loop
optimizer.zero_grad()
loss = model(p_h, p_l, p_t, n_h, n_l, n_t)
total_loss += loss.item()
loss.backward()
optimizer.step()
So which one should I use and why?