Quick question, if we only wanted to penalize the weights of the (say) the first layer and not the others. Is there an example to refer to?
Even more specifically, it would be great to able to penalize the gradient with respect to the weights of a certain layer
You can take the weights of the first layer with
model.layer1.parameters(), and apply your penalty only to those weights.
To penalize the gradient of a layer, have a look at the Higher Order gradients in https://github.com/pytorch/pytorch/releases/tag/v0.2.0