Sum of weights (each row) is equal to 1

huppaluru · June 21, 2022, 6:45am

class Net(nn.Module):
    def __init__(self, input_size, hidden_1_size, hidden_2_size, output_size, mask_one, mask_two):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_1_size, bias=True)
        self.fc2 = nn.Linear(hidden_1_size, hidden_2_size)
        self.fc3 = nn.Linear(hidden_2_size, output_size)

        final_layer_weights = torch.ones(size=(1,hidden_2_size), requires_grad=False) * (1/hidden_2_size)
        
        with torch.no_grad():
            self.fc1.weight.mul_(mask_one)
            self.fc2.weight.mul_(mask_two)
            self.fc3.weight = nn.Parameter(final_layer_weights)

    def forward(self, x):
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        return x

# Parameters
n_inputs = 5
n_hidden_1 = 5 
n_hidden_2 = 4 
n_output = 1 # OUTPUT

mask1 = torch.tensor([[1,0,0,0,0],[0,1,0,0,0],[0,0,1,0,0],[0,0,0,1,0],[0,0,0,0,1]])
mask2 = torch.tensor([[1,1,0,0,1],[0,1,1,0,1],[0,0,1,1,1],[1,0,0,1,1]])
model = Net(n_inputs, n_hidden_1, n_hidden_2, n_output, mask1, mask2)
model.fc3.requires_grad_(False)

for epoch in range(n_epochs):
    optimizer.zero_grad()

    # forward pass and loss
    y_pred = model(data)
    # print(y_pred)
    loss = criterion(y_pred, target)
    
    # backward pass
    loss.backward()
    # print()
    
    # # Zero out gradients
    with torch.no_grad():
        model.fc1.weight.grad.mul_(mask1)
        model.fc2.weight.grad.mul_(mask2)


    # update
    optimizer.step()

    with torch.no_grad():
        model.fc1.weight.clamp_(min=0)

    if (epoch+1) % 1 == 0:
        print(f'epoch: {epoch+1}, loss = {loss.item():.4f}')

I have the above simple NN but I have the following constraints.

I need the weights in fc1 to be all positive.
I need the weight in fc2 to be all positive and sum of weights in each row equal to 1

I have realized that using clamp I can update the weights in fc1. However I am unable to complete the step 2 correctly.

Any help is appreciated. Thank you.

AlphaBetaGamma96 · June 21, 2022, 12:27pm

If you want to ensure all weights sum to 1, take the weights and sum over them then divide the weights by the sum. That will normalize your weights to sum to 1 (row-wise).

You could implement this via a custom nn.Module object, by just taking the nn.Linear source code and modifying the forward pass.

For example, a linear layer (without bias) of 4 input and 5 outputs would have the following shape,

weight = torch.randn(5,4)

#returns 
tensor([[-0.0758, -0.5082, -0.0323,  0.9522],
        [-1.4773,  1.6976, -0.2631,  1.2195],
        [ 0.8877, -0.8958, -0.7618,  1.1449],
        [ 0.2790,  2.1348, -0.0403,  1.6947],
        [ 0.6023, -0.5180, -0.4327,  0.6962]])

You can row-wise normalize to 1 by the following,

weight_norm = weight / weight.sum(dim=1, keepdim=True)
#returns 
tensor([[-0.2255, -1.5130, -0.0962,  2.8348],
        [-1.2554,  1.4426, -0.2236,  1.0363],
        [ 2.3675, -2.3893, -2.0318,  3.0536],
        [ 0.0686,  0.5248, -0.0099,  0.4166],
        [ 1.7318, -1.4895, -1.2440,  2.0017]])

huppaluru · June 21, 2022, 4:50pm

Hi @AlphaBetaGamma96

Thank you very much for your reply.

I have done the following - Can you please confirm it looks right?

for epoch in range(n_epochs):
    optimizer.zero_grad()

    # forward pass and loss
    y_pred = model(data)
    # print(y_pred)
    loss = criterion(y_pred, target)
    
    # backward pass
    loss.backward()
    
    
    # # Zero out gradients
    with torch.no_grad():
        model.fc1.weight.grad.mul_(mask1)
        model.fc2.weight.grad.mul_(mask2)

    # update
    optimizer.step()
    
    with torch.no_grad():
        model.fc1.weight.clamp_(min=0)
        model.fc2.weight = nn.Parameter(model.fc2.weight/model.fc2.weight.sum(dim=1, keepdim=True))
        model.fc2.weight.clamp_(min=0)

    if (epoch+1) % 1 == 0:
        print(f'epoch: {epoch+1}, loss = {loss.item():.4f}')

Thank you.

AlphaBetaGamma96 · June 22, 2022, 9:57am

I have a feeling re-wrapping it as nn.Parameter will cause an issue so it’s probably best to do something like,

weight_norm = model.fc2.weight/model.fc2.weight.sum(dim=1, keepdim=True)
model.fc2.weight.copy_(weight_norm)

This will normalize the weights after each update which might cause your loss to change rapidly as you’re constantly the weights, for the most stable performance I’d recommend defining a custom nn.Linear object that explicitly handles the weight-normalization, but gives this a go and see how it works.

huppaluru · June 28, 2022, 12:07am

Thank you very much for the help. I have followed your suggestion

I was wondering if you could help me with questions regarding building this NN:

import torch
from torch.nn import functional as F

class MyFirstLayer(torch.nn.Module):
    def __init__(self, inputSize, outputSize, mask, constantForBias):
        super(MyFirstLayer, self).__init__()
        self.input_size = inputSize
        self.output_size = outputSize
        self.first_layer_mask = mask
        self.constant_term = constantForBias
        
        self.weights = torch.nn.Parameter(torch.Tensor(self.output_size, self.input_size), requires_grad=False)
        self.biases = torch.nn.Parameter(torch.Tensor(1, self.output_size), requires_grad=False)

        # Initialize weights and biases
        torch.nn.init.ones_(self.weights)
        self.weights.mul_(self.first_layer_mask)
        torch.nn.init.ones_(self.biases)
        self.biases.mul_(self.constant_term)
        
    def forward(self, input):
        return F.linear(input, self.weights, self.biases)

class MySecondLayer(torch.nn.Module):
    def __init__(self, inputSize, outputSize, mask):
        super(MySecondLayer, self).__init__()
        self.input_size = inputSize
        self.output_size = outputSize
        self.second_layer_mask = mask

        self.weights = torch.nn.Parameter(torch.Tensor(self.output_size, self.input_size), requires_grad=True)

        # Initialize weights
        torch.nn.init.ones_(self.weights)
        with torch.no_grad():
            self.weights.mul_(self.second_layer_mask)

    def forward(self, input):
        weight_norm = self.weights/self.weights.sum(dim=1, keepdim=True)
        self.weights = torch.nn.Parameter((weight_norm))
        return F.linear(input, self.weights)

class MyThirdLayer(torch.nn.Module):
    def __init__(self, inputSize, outputSize, mask):
        super(MyThirdLayer, self).__init__()
        self.input_size = inputSize
        self.output_size = outputSize
        self.third_layer_mask = mask

        self.weights = torch.nn.Parameter(torch.Tensor(self.output_size, self.input_size), requires_grad=False)
        
        # Initialize weights and biases
        torch.nn.init.ones_(self.weights)
        self.weights.mul_(self.third_layer_mask)
    
    def forward(self, input):
        return F.linear(input, self.weights)

class MyNN(torch.nn.Module):
    def __init__(self, inputSize, hiddenSize1, hiddenSize2, outputSize, firstMask, secondMask, thirdMask, biasConstant):
        super(MyNN, self).__init__()
        self.fc1 = MyFirstLayer(inputSize, hiddenSize1, firstMask, biasConstant)
        self.fc2 = MySecondLayer(hiddenSize1, hiddenSize2, secondMask)
        self.fc3 = MyThirdLayer(hiddenSize2, outputSize, thirdMask)

    def forward(self, x):
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        return x

# Hyper Parameters
n_inputs = 5
n_hidden_1 = 10
n_hidden_2 = 8 
n_output = 2 

mask1 = torch.tensor([[1,0,0,0,0],[0,1,0,0,0],[0,0,1,0,0],[0,0,0,1,0],[0,0,0,0,1],[1,0,0,0,0],[0,1,0,0,0],[0,0,1,0,0],[0,0,0,1,0],[0,0,0,0,1]])
mask2 = torch.tensor([[1,1,0,0,1,0,0,0,0,0],[0,1,1,0,1,0,0,0,0,0],[0,0,1,1,1,0,0,0,0,0],[1,0,0,1,1,0,0,0,0,0],[0,0,0,0,0,1,1,0,0,1],[0,0,0,0,0,0,1,1,0,1],[0,0,0,0,0,0,0,1,1,1],[0,0,0,0,0,1,0,0,1,1]])
mask3 = torch.tensor([[1,1,1,1,0,0,0,0],[0,0,0,0,1,1,1,1]])

constant_for_bias = torch.tensor([[0.5]])

model = MyNN(n_inputs, n_hidden_1, n_hidden_2, n_output, mask1, mask2, mask3, constant_for_bias)

# Dummy Training Example
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)
data = torch.randn(size=(1, 5))
target = torch.randn((1,2))
criterion = torch.nn.MSELoss(reduction='sum')
n_epochs = 10

for epoch in range(n_epochs):
    optimizer.zero_grad()

    # forward pass and loss
    y_pred = model(data)
    loss = criterion(y_pred, target)
    
    # backward pass
    loss.backward()
    
    # Zero out gradients
    with torch.no_grad():
        model.fc2.weights.grad.mul_(mask2)

    #update
    optimizer.step()
          
    if (epoch+1) % 1 == 0:
        print(f'epoch: {epoch+1}, loss = {loss.item():.4f}')

Given the structure of the NN I mentioned above and creating custom layers, Is my approach correct? Because when I run the code, the loss doesn’t decrease. I think I am in the right direction but I am going somewhere wrong.

Thank you.

ptrblck · June 28, 2022, 5:17am

I might be missing something obvious from your requirements, but why are you setting the requires_grad attribute to False for the custom parameters?
This would make them static (unless you overwrite them) and the optimizer will not update these parameters as they won’t get a valid gradient.

If you want to manipulate the parameters before applying the forward pass you perform the manipulation inplace and wrap it into a no_grad() guard:

    def forward(self, input):
        with torch.no_grad():
            weight_norm = self.weights/self.weights.sum(dim=1, keepdim=True)
            self.weights.copy_(weight_norm)
        return F.linear(input, self.weights)

huppaluru · June 28, 2022, 6:31am

Hi,

Thank you for the reply.

There is no particular use case currently. I am just experimenting. But I would like them to be static at this point. For example, assuming that those weights are given.

By making the change you mentioned, I am able to see that the loss decreases.

I was also wondering if I can manipulate the weights (between hidden layers) to have the top half and bottom half equal.

Thank you.

ptrblck · June 28, 2022, 6:57am

Yes, you should be able to “reset” the weights using your requirements by manipulating them inplace in a no_grad() guard before they are used.
Make sure to not create new parameters if they should be trained as you would need to pass them again to the optimizer. Also, do not manipulate them after the forward pass as the gradient calculation would be wrong.

huppaluru · June 29, 2022, 7:32am

Thank you very much.

I think I was able to achieve what I was looking for.