Forcing diagonal elements of matrix to be zero

I have a weight matrix in a neural network and I want to force the diagonal elements to be zero. All the parameters of the weight matrix are independently adjustable, except for the diagonal elements, which should be zero (ie the diagonal is constant and zero, but the other weights are learnable). Later on there may be different regularizations as well but the solution here shouldn’t involve regularization.

I can think of a couple of ways to do this:

-initialize the diagonal elements of the weight matrix to be zero, and then set requires_grad = False for all diagonal elements
-create a constant variable that’s a mask of 1s and 0s and use that to multiply an adjustable variable. Then use this product as the weight matrix

I was wondering what you would recommend as the best approach.

This won’t work. There is no notion of part of a tensor requires_grad or not requires_grad.

This could work.

Lastly, another way that might be easier is just to register a backward hook on the weight parameters. In the hook you can zero out all the entries on the diagonals of the gradient.