First Derivative matirx in L2 norm constraint

Dear all,
Recently, I work on this loss function which has a special L2 norm constraint.

The G denotes the first derivative matrix for the first layer in the neural network. I try to search for a lot of methods. However, it can not work for this constraint. How can I implement this constraint? To write a new autograd function for the first layer in the neural network? or implement a new optimizer?
Thank you for your kindest help!


If G = dce_loss / dW1 where ce_loss is the cross entropy loss. Then you can compute this as:

ce_loss = criterion(output, target)
G = autograd.grad(ce_loss, W1, create_graph=True)[0]
loss = ce_loss + (G * W1).pow(2).sum()