Hi Bala (and Alban)!

My intuition is that it is better to smoothly map an unconstrained trainable

parameter (that runs over `(-inf, inf)`

) to a new tensor whose diagonal

is negative (and runs over `(-inf, 0.0)`

, rather than brute-force flip the

sign of the diagonal. It is straightforward and conceptually satisfying to

train the unconstrained parameter and understand the negative-diagonal

as an intermediate result.

Suppose during training your optimizer moves a slightly negative diagonal

entry to a slightly positive value. You then flip it back to negative, But on

the next iteration, the optimizer moves it back to a positive value. While

conceptually acceptable, it just seems to me that this is likely to throw a

little bit of sand in the optimization process (and possibly confuse fancier

optimizers such as `Adam`

),

`-exp()`

is a well-behaved function maps to strictly negative values.

Here is an illustration:

```
>>> import torch
>>> torch.__version__
'2.0.0'
>>>
>>> _ = torch.manual_seed (2023)
>>>
>>> preWeight = torch.randn (5, 5, requires_grad = True) # unconstrained trainable parameter
>>> preWeight # unconstrained diagonal -- can be positive
tensor([[ 0.4305, -0.3499, 0.4749, 0.9041, -0.7021],
[ 1.5963, 0.4228, -0.6940, 0.9672, -0.5319],
[ 0.8088, -0.1603, 0.8184, -0.6093, 0.8177],
[ 0.1459, -0.9558, -1.3761, 1.3246, -0.0744],
[ 0.5472, 1.6779, 0.8275, -1.0542, -0.7374]], requires_grad=True)
>>> weight = preWeight.clone()
>>> weight.diagonal().copy_ (-preWeight.diagonal().exp())
tensor([-1.5380, -1.5262, -2.2668, -3.7607, -0.4784],
grad_fn=<AsStridedBackward0>)
>>> weight # derived weight tensor with negative diagonal
tensor([[-1.5380, -0.3499, 0.4749, 0.9041, -0.7021],
[ 1.5963, -1.5262, -0.6940, 0.9672, -0.5319],
[ 0.8088, -0.1603, -2.2668, -0.6093, 0.8177],
[ 0.1459, -0.9558, -1.3761, -3.7607, -0.0744],
[ 0.5472, 1.6779, 0.8275, -1.0542, -0.4784]], grad_fn=<CopySlices>)
>>> x = torch.randn (5, 5)
>>> (weight @ x).sum().backward()
>>> preWeight.grad # gradients flow back to trainable parameter
tensor([[-1.2142, 0.9742, 3.3650, -1.4189, 2.5436],
[ 0.7895, -1.4869, 3.3650, -1.4189, 2.5436],
[ 0.7895, 0.9742, -7.6279, -1.4189, 2.5436],
[ 0.7895, 0.9742, 3.3650, 5.3361, 2.5436],
[ 0.7895, 0.9742, 3.3650, -1.4189, -1.2168]])
```

(Also, as Alban notes, you could register such a mapping as a

parameterization.)

Best.

K. Frank