How to apply constraint to weights in convolutional layer?

zb48 · February 20, 2018, 5:58pm

If I have a Convolutional Neural Network (without pooling and everything other than the convolutional filter, because I only want to apply blurring to my images) that looks like:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=5, padding=2)

    def forward(self, x):
        x = self.conv1(x)
        return x

model=Net()
model.train()
for batch_idx, (main, blur) in enumerate(train_loader):
    optimizer.zero_grad()
    output=model(main)
    loss = F.mse_loss(output, blur)
    loss.backward()
    optimizer.step()

SimonW · February 20, 2018, 6:32pm

Define how this constraint should work with asymmetrical gradients.

zb48 · February 20, 2018, 7:10pm

why is that relevant? I just want this to work for the case of using a MSE loss

SimonW · February 20, 2018, 7:11pm

Why is it not relevant? You need to define how the weight matrix should be updated the gradient is asymmetrical so that your constraint can still be satisfied.

zb48 · February 20, 2018, 7:48pm

It doesn’t really matter to me how the weight matrix gets updated. As long as it satisfies the constraint such that it also results in the lowest MSE loss. What do you mean the gradient is asymmetrical?

SimonW · February 20, 2018, 11:30pm

There are two ways:

You can parameterize your weight tensor, e.g. eigen decomposition, or just simply copying the entries reflected by the diagonal.
You can somehow project the weight matrix to your constrained subspace where it is symmetrical after each optimization step. However, if the gradients are not symmetrical (i.e. not satisfying the property you said above), then projection may or may not work well as this step is hidden away from the optimizing algorithm.

It seems that (1) is better, but you need to try to know if it works well in practical. By the way, by using first order optimization algorithms as we mostly do in deep learning, unless your model is fairly simple, you are not guaranteed to achieve lowest loss. Often even local minima is not gauranteed.

zb48 · February 21, 2018, 12:11am

and how would I implement this using PyTorch? What would the code look like?