How to correctly impose a weight constraint

I have the following model;

X_i = (c_1i X_1, …, c_di X_d) for i=1,…,d;

This leads to a matrix C of d*d parameters. For reasons that are tedious to explain, I, therefore, have a d x d matrix of estimated parameters with

X^est_i = (c^est_1i X_1, …, c^est_di X_d)

and choose my loss to be

max_{i=1,…,d}(X^est_i / X_i)

Observe that all values in C as well as X are non-negative; This problem should be very easy to solve as it is convex (reducing a value c_ji never increases the loss)

Now I want to implement that the sum of all entries of C to be equal (or larger) to a certain threshold, let us call is eps;

What is the best way to do it? I found three solutions that all seem to fail; Either the estimator gets stuck or goes to infinity; The 3 approaches I thought of:

  1. only define d^2-1 values and then just assign the last value as torch.abs(eps-sum(C)) - then the sum of all values is always equal to eps;

  2. Define d^2 values and scale up the values in each iteration by multiplying each weight by eps/sum(C)

  3. give a penalty lambda*torch.abs(eps-sum(C)) - for lambda large enough, this will enforce the sum of the values to be eps;

So I tried all of these approaches but all of them seem to fail; Either they get stuck or they converge to infitinity;

Below I implemented the first approach

import torch
import torch.nn as nn
import numpy as np
import scipy.stats as st
import torch.optim as optim
import numpy as np
import copy

class Network(nn.Module):
    def __init__(self, dim):
        super(Network, self).__init__()
        self.linears = nn.ModuleList([nn.Linear(1, d, bias=False) for i in range(d-1)])
        self.final_layer = nn.Linear(1, (d-1), bias=False)
        self.dim = dim

    def forward(self, x): 
        d = self.dim

        for i, l in enumerate(self.linears):
            y[i,:,:] = torch.transpose(l(x[:,i].view(-1,1)),0,1)
        y[d-1,1:d,:] = torch.transpose(self.final_layer(x[:,d-1].view(-1,1)),0,1)
        y=torch.max(y, axis=0).values
        return torch.transpose(y,0,1)
    def weight_constraint(self):  
        for i, l in enumerate(self.linears):
        return reg

def custom_loss(output, target):
    loss = torch.max(output/target)
    return loss


model = Network(dim=d)


Z=np.random.lognormal( 0, 3, size=(n,d))


for i in range(n):
    for j in range(d):


optimizer = optim.LBFGS(model.parameters(), lr=0.06)

for t in range(100000):

    def closure():
        x_pred = model(torch.Tensor(X))
        loss  = custom_loss(x_pred, torch.Tensor(X))
        for i, layer in enumerate(model.linears):
            with torch.no_grad():
                model.linears[i].weight.copy_ (model.linears[i]
        with torch.no_grad():
            model.final_layer.weight.copy_ (

        return loss

#Testing if true C matrix indeed gives lower penalty

for j,l in enumerate(model2.linears):
    for i in range(d):
        with torch.no_grad():
for i in range(d-1):           
     with torch.no_grad():

loss  = custom_loss(x_pred, torch.Tensor(X))
print("Loss Value for the true C Matrix: ", loss)

You can see that the true loss value is just 1, but the minimum it finds is far away from 1; Let me quickly explain what I am doing;

I define (d-1) layers of size d, one layer of size (d-1) so I have exactly d^2-1 variables;

Then the forward function just tries to calculate X^est based on these layers and weight_contraint() just calculates the sum of all values of the layer;

The rest should be basic; I generate data, I run the pytorch algorithm; In the end, I test if I set up the forward function and the network correctly; I use the true C values to show that it indeed gives error 1;

Any idea how I can properly set up this weight constraint or is it impossible?