```
class MyLayer(nn.module):
def __init__(self, num_units):
self.weight = Parameter(torch.rand(num_units, num_units))
```

I want to optimize all of the weights in the tensor except for the ones across the diagonal (i.e. the weights across the diagonal should stay fixed). What is the simplest way to exclude these weights across the diagonal from being changed when I perform backpropagation (with `loss.backward()`

)?