Kernel Regularizer

Hello, I wanted to understand how to implement kernel regularizer(parameter in Keras/Tensorflow layer) in a layer in PyTorch. I saw examples of how to implement regularizer for overall loss, but could not find the relevant documentation for this.

Could someone point me in the right direction? I saw something along the lines like what I intend to do here: “How does one implement Weight regularization (l1 or l2) manually without optimum?”.

I have adapted it as follows:

l2_reg = None

for i in model.named_parameters():
   if "layer_name.weight" in i:
      if l2_reg is None:
         l2_reg = i.norm(2)**2
      else:
        l2_reg = l2_reg + i.norm(2)**2
batch_loss = some_loss_function + l2_reg * reg_lambda
batch_loss.backward()

Is this the correct way to do so?

Thanks in advance!

The code snippet looks generally correct.
One minor issue: named_parameters() will return a tuple as name and param, so you might need to use these two variables in the for loop or alternatively unpack i.

Hi @ptrblck, thanks for the feedback, yeah I have accessed it through index in my code.
I had encountered an error when I implemented it like this:

l2_reg = None

for i in model.named_parameters():
   if "layer_name.weight" in i[0]:
      if l2_reg is None:
         l2_reg = i[1].norm(2)**2
      else:
        l2_reg = l2_reg + i[1].norm(2)**2
batch_loss = some_loss_function + l2_reg * reg_lambda
batch_loss.backward()

RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph = True when calling backward the first time"
After adding this line batch_loss.backward(retain_graph=True) this works.
Though there exists a direct thread, may I take this opportunity to ask you if it is possible to avoid the use of retain_graph=True in the above code? I have read this increases the training time in each consecutive training iteration.Hence, I would like to avoid the use if possible.