Apply multiple regularization values to the same layer

Antonio_Ossa · December 23, 2022, 6:46am

Hi everyone,

Say I’m training a model using triplet loss 1, trying to learn distances between pairs of objects. While training, I have an anchor a, a positive item pos, and a negative item neg. Also, for simplicity, let’s say that the model is this one:

class Model(nn.Module):

    def __init__(self):
        super().__init__()
        self.common_layer = CNN()
        
    def forward(self, anchor, pos, neg):
        anchor = self.common_layer(anchor)
        pos = self.common_layer(pos)
        neg = self.common_layer(neg)
        return anchor * (pos - neg)

Notice that in the forward step I’m using the same layer with 3 different objects.

While training, I would like to apply L2 regularization (using the weight_decay argument of the optimizer) but using a different lambda value for the anchor (lambda_a), the positive item (lambda_pos), and the negative item (lambda_neg). Is that possible? Is this somehow a “bad practice”?

If my understanding is correct, this comment in a GitHub issue seems to imply that applying multiple optimizers on the same layer is undesired (that’s why the suggestion was raising an exception).

PS: This is a simplification of something I saw in a paper implementation. The paper was not implemented in PyTorch, and other implementations seem to ignore this detail. This is a version of the snippet, where the forward step was implemented manually (C++):

double anchor = common_repr[anchor_id][f];
double pos_item = common_repr[pos_item_id][f];
double neg_item = common_repr[neg_item_id][f];

common_repr[anchor_id][f]   += learn_rate * ( deri * (pos_item - neg_item) - lambda1 * anchor);
common_repr[pos_item_id][f] += learn_rate * ( deri * anchor - lambda2 * pos_item);
common_repr[neg_item_id][f] += learn_rate * (-deri * anchor - lambda2 / 10.0 * neg_item);