If I have a Convolutional Neural Network (without pooling and everything other than the convolutional filter, because I only want to apply blurring to my images) that looks like:

class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=5, padding=2)
def forward(self, x):
x = self.conv1(x)
return x
model=Net()
model.train()
for batch_idx, (main, blur) in enumerate(train_loader):
optimizer.zero_grad()
output=model(main)
loss = F.mse_loss(output, blur)
loss.backward()
optimizer.step()

Why is it not relevant? You need to define how the weight matrix should be updated the gradient is asymmetrical so that your constraint can still be satisfied.

It doesnâ€™t really matter to me how the weight matrix gets updated. As long as it satisfies the constraint such that it also results in the lowest MSE loss. What do you mean the gradient is asymmetrical?

You can parameterize your weight tensor, e.g. eigen decomposition, or just simply copying the entries reflected by the diagonal.

You can somehow project the weight matrix to your constrained subspace where it is symmetrical after each optimization step. However, if the gradients are not symmetrical (i.e. not satisfying the property you said above), then projection may or may not work well as this step is hidden away from the optimizing algorithm.

It seems that (1) is better, but you need to try to know if it works well in practical. By the way, by using first order optimization algorithms as we mostly do in deep learning, unless your model is fairly simple, you are not guaranteed to achieve lowest loss. Often even local minima is not gauranteed.