Say I’m training a model using triplet loss 1, trying to learn distances between pairs of objects. While training, I have an anchor
a, a positive item
pos, and a negative item
neg. Also, for simplicity, let’s say that the model is this one:
class Model(nn.Module): def __init__(self): super().__init__() self.common_layer = CNN() def forward(self, anchor, pos, neg): anchor = self.common_layer(anchor) pos = self.common_layer(pos) neg = self.common_layer(neg) return anchor * (pos - neg)
Notice that in the forward step I’m using the same layer with 3 different objects.
While training, I would like to apply L2 regularization (using the
weight_decay argument of the optimizer) but using a different lambda value for the anchor (lambda_a), the positive item (lambda_pos), and the negative item (lambda_neg). Is that possible? Is this somehow a “bad practice”?
If my understanding is correct, this comment in a GitHub issue seems to imply that applying multiple optimizers on the same layer is undesired (that’s why the suggestion was raising an exception).
PS: This is a simplification of something I saw in a paper implementation. The paper was not implemented in PyTorch, and other implementations seem to ignore this detail. This is a version of the snippet, where the forward step was implemented manually (C++):
double anchor = common_repr[anchor_id][f]; double pos_item = common_repr[pos_item_id][f]; double neg_item = common_repr[neg_item_id][f]; common_repr[anchor_id][f] += learn_rate * ( deri * (pos_item - neg_item) - lambda1 * anchor); common_repr[pos_item_id][f] += learn_rate * ( deri * anchor - lambda2 * pos_item); common_repr[neg_item_id][f] += learn_rate * (-deri * anchor - lambda2 / 10.0 * neg_item);