Hi everyone,
Say I’m training a model using triplet loss 1, trying to learn distances between pairs of objects. While training, I have an anchor a
, a positive item pos
, and a negative item neg
. Also, for simplicity, let’s say that the model is this one:
class Model(nn.Module):
def __init__(self):
super().__init__()
self.common_layer = CNN()
def forward(self, anchor, pos, neg):
anchor = self.common_layer(anchor)
pos = self.common_layer(pos)
neg = self.common_layer(neg)
return anchor * (pos - neg)
Notice that in the forward step I’m using the same layer with 3 different objects.
While training, I would like to apply L2 regularization (using the weight_decay
argument of the optimizer) but using a different lambda value for the anchor (lambda_a), the positive item (lambda_pos), and the negative item (lambda_neg). Is that possible? Is this somehow a “bad practice”?
If my understanding is correct, this comment in a GitHub issue seems to imply that applying multiple optimizers on the same layer is undesired (that’s why the suggestion was raising an exception).
PS: This is a simplification of something I saw in a paper implementation. The paper was not implemented in PyTorch, and other implementations seem to ignore this detail. This is a version of the snippet, where the forward step was implemented manually (C++):
double anchor = common_repr[anchor_id][f];
double pos_item = common_repr[pos_item_id][f];
double neg_item = common_repr[neg_item_id][f];
common_repr[anchor_id][f] += learn_rate * ( deri * (pos_item - neg_item) - lambda1 * anchor);
common_repr[pos_item_id][f] += learn_rate * ( deri * anchor - lambda2 * pos_item);
common_repr[neg_item_id][f] += learn_rate * (-deri * anchor - lambda2 / 10.0 * neg_item);