# How to set the diagonal entries of Linear layer weight matrix to be always negative?

I want to create a neural network with a single linear layer whose weight matrix has diagonal entries that will always be negative (even during training). I tried various means, but nothing seems to work. The closest I have got to is the below code:

``````class IdentityMask(nn.Module):
def __init__(self, n):
self.n = n
torch.diagonal(self.weight).fill_(-1.0)

class LinearLayer(nn.Module):
def __init__(self, n):
super(LinearLayer, self).__init__()
self.linear = nn.Linear(n, n, bias=False)
nn.init.normal_(self.linear.weight, mean=0.0, std=0.01)

def forward(self, x):
x = self.linear(x)
return x
``````

What should I do to get the diagonal entries of `self.linear.weight` to be negative?

OK. I figured out what is the problem. The line `self.linear.weight.data *= self.identity_mask.weight` multiplies the diagonal entries by -1 for every forward pass. But what we want is diagonal entry multiplied by -1 only when the entry turns positive. So I have modified the code as follows:

``````class LinearLayer(nn.Module):
def __init__(self, n):
super(LinearLayer, self).__init__()
self.linear = nn.Linear(n, n, bias=False)
nn.init.normal_(self.linear.weight, mean=0.0, std=0.01)

def forward(self, x):
diagonal = torch.diag(self.linear.weight)
for i in range(diagonal.size(0)):
if diagonal[i] > 0:
diagonal[i] *= -1.0
self.linear.weight.data[i][i] = diagonal[i]
print("Diagonal entries:", diagonal)
x = self.linear(x)
return x
``````

No need for a mask layer. This code works. But if there are any other better ways to achieve my objective, please let me know.

note that a more efficient implementation of this can be done:

``````import torch

a = torch.randn(10, 10)

print(a.diagonal())
a.diagonal().mul_(a.diagonal().sign())
print(a.diagonal())
``````

Also you can use `with torch.no_grad()` instead of `.data`.

Finally this looks like a good candidate for a Parametrization (torch.nn.utils.parametrize.register_parametrization â€” PyTorch 2.0 documentation) if you want to be able to do this without having to rewrite a Module by hand.

Hi Bala (and Alban)!

My intuition is that it is better to smoothly map an unconstrained trainable
parameter (that runs over `(-inf, inf)`) to a new tensor whose diagonal
is negative (and runs over `(-inf, 0.0)`, rather than brute-force flip the
sign of the diagonal. It is straightforward and conceptually satisfying to
train the unconstrained parameter and understand the negative-diagonal
as an intermediate result.

Suppose during training your optimizer moves a slightly negative diagonal
entry to a slightly positive value. You then flip it back to negative, But on
the next iteration, the optimizer moves it back to a positive value. While
conceptually acceptable, it just seems to me that this is likely to throw a
little bit of sand in the optimization process (and possibly confuse fancier
optimizers such as `Adam`),

`-exp()` is a well-behaved function maps to strictly negative values.

Here is an illustration:

``````>>> import torch
>>> torch.__version__
'2.0.0'
>>>
>>> _ = torch.manual_seed (2023)
>>>
>>> preWeight = torch.randn (5, 5, requires_grad = True)   # unconstrained trainable parameter
>>> preWeight                                              # unconstrained diagonal -- can be positive
tensor([[ 0.4305, -0.3499,  0.4749,  0.9041, -0.7021],
[ 1.5963,  0.4228, -0.6940,  0.9672, -0.5319],
[ 0.8088, -0.1603,  0.8184, -0.6093,  0.8177],
[ 0.1459, -0.9558, -1.3761,  1.3246, -0.0744],
[ 0.5472,  1.6779,  0.8275, -1.0542, -0.7374]], requires_grad=True)
>>> weight = preWeight.clone()
>>> weight.diagonal().copy_ (-preWeight.diagonal().exp())
tensor([-1.5380, -1.5262, -2.2668, -3.7607, -0.4784],
>>> weight                                                 # derived weight tensor with negative diagonal
tensor([[-1.5380, -0.3499,  0.4749,  0.9041, -0.7021],
[ 1.5963, -1.5262, -0.6940,  0.9672, -0.5319],
[ 0.8088, -0.1603, -2.2668, -0.6093,  0.8177],
[ 0.1459, -0.9558, -1.3761, -3.7607, -0.0744],
[ 0.5472,  1.6779,  0.8275, -1.0542, -0.4784]], grad_fn=<CopySlices>)
>>> x = torch.randn (5, 5)
>>> (weight @ x).sum().backward()
tensor([[-1.2142,  0.9742,  3.3650, -1.4189,  2.5436],
[ 0.7895, -1.4869,  3.3650, -1.4189,  2.5436],
[ 0.7895,  0.9742, -7.6279, -1.4189,  2.5436],
[ 0.7895,  0.9742,  3.3650,  5.3361,  2.5436],
[ 0.7895,  0.9742,  3.3650, -1.4189, -1.2168]])
``````

(Also, as Alban notes, you could register such a mapping as a
parameterization.)

Best.

K. Frank

1 Like

Hi,

I am revisiting this problem after a long time. I am essentially trying to follow the idea of using a smooth map ( -exp() ) applied to the diagonal elements, suggested by KFrank. But now I am getting a different error that seems to do with autograd and back propagation. The error that I get is: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

Here is the new code:

``````class NegDiagLinear(nn.Module):
def __init__(self, in_features, out_features):
super(NegDiagLinear, self).__init__()
self.in_features = in_features
self.out_features = out_features
self.pre_weight = nn.Parameter(torch.Tensor(out_features, in_features))
nn.init.normal_(self.pre_weight, mean=0, std=0.01)
self.weight = self.pre_weight.clone()

def forward(self, input):
self.weight.diagonal().copy_ (-self.pre_weight.diagonal().exp())
return input @ self.weight

class simple_model(nn.Module):
def __init__(self):
super().__init__()
self.linear = NegDiagLinear(2,2)

def forward(self, x):
return self.linear(x)

model = simple_model()

define_criterion = torch.nn.MSELoss()

SGD_optimizer = torch.optim.SGD(model.parameters(), lr=0.001)

for epoch in range(18):
predict_y = model(x)
loss = define_criterion(predict_y, y)
loss.backward()
SGD_optimizer.step()
print('epoch {}, loss function {}'.format(epoch, loss.item()))
``````

My initial guess is that this is due to clone() applied to self.pre_weight(). So I dabbled around this trying to see if I could get around this problem. But I canâ€™t seem to get rid. Please help me to solve this problem. If you could also provide an explanation of the cause of this error, it will be nice.

Hi Bala!

``````      self.weight = self.pre_weight.clone()
``````

creates the parts of the computation graph that connects the off-diagonal
elements of `weight` to `pre_weight` only once (in `__init__()`).

Then:

``````       self.weight.diagonal().copy_ (-self.pre_weight.diagonal().exp())
``````

creates the computation graph for the diagonal elements every time you call
forward.

But your first call to `.backward()` frees the whole computation graph, including
that for the off-diagonal elements, and the off-diagonal piece is never rebuilt. So
your second call to `.backward()` raises the â€śbackward through the graph a second
timeâ€ť error.

Try:

``````    def forward(self, input):
weight = self.pre_weight.clone()
weight.diagonal().copy_ (-self.pre_weight.diagonal().exp())
return input @ weight
``````

`forward()` now rebuilds the whole computation graph â€“ both the off-diagonal and
diagonal parts â€“ every time it is called.

Note that `weight` is no longer a property of `NegDiagLinear` â€“ itâ€™s now just a
local variable in `forward()`.

Best.

K. Frank