I have the following model;
X_i = (c_1i X_1, …, c_di X_d) for i=1,…,d;
This leads to a matrix C of d*d parameters. For reasons that are tedious to explain, I, therefore, have a d x d matrix of estimated parameters with
X^est_i = (c^est_1i X_1, …, c^est_di X_d)
and choose my loss to be
max_{i=1,…,d}(X^est_i / X_i)
Observe that all values in C as well as X are non-negative; This problem should be very easy to solve as it is convex (reducing a value c_ji never increases the loss)
Now I want to implement that the sum of all entries of C to be equal (or larger) to a certain threshold, let us call is eps;
What is the best way to do it? I found three solutions that all seem to fail; Either the estimator gets stuck or goes to infinity; The 3 approaches I thought of:
-
only define d^2-1 values and then just assign the last value as torch.abs(eps-sum(C)) - then the sum of all values is always equal to eps;
-
Define d^2 values and scale up the values in each iteration by multiplying each weight by eps/sum(C)
-
give a penalty lambda*torch.abs(eps-sum(C)) - for lambda large enough, this will enforce the sum of the values to be eps;
So I tried all of these approaches but all of them seem to fail; Either they get stuck or they converge to infitinity;
Below I implemented the first approach
import torch
import torch.nn as nn
import numpy as np
import scipy.stats as st
import torch.optim as optim
import numpy as np
import copy
class Network(nn.Module):
def __init__(self, dim):
super(Network, self).__init__()
d=dim
self.linears = nn.ModuleList([nn.Linear(1, d, bias=False) for i in range(d-1)])
self.final_layer = nn.Linear(1, (d-1), bias=False)
self.dim = dim
def forward(self, x):
d = self.dim
y=torch.zeros((d,d,x.size()[0]))
for i, l in enumerate(self.linears):
y[i,:,:] = torch.transpose(l(x[:,i].view(-1,1)),0,1)
y[d-1,1:d,:] = torch.transpose(self.final_layer(x[:,d-1].view(-1,1)),0,1)
reg=self.weight_constraint()
y[d-1,0,:]=torch.abs(reg-lambda1)*x[:,d-1]
y=torch.max(y, axis=0).values
return torch.transpose(y,0,1)
def weight_constraint(self):
reg=0
for i, l in enumerate(self.linears):
reg+=torch.sum(l.weight)
reg+=torch.sum(self.final_layer.weight)
return reg
def custom_loss(output, target):
loss = torch.max(output/target)
return loss
np.random.seed(seed=1)
torch.manual_seed(1)
d=3
n=100
model = Network(dim=d)
C=np.array([[1,0.5,0.3],[0,1,0],[0,0,1]])
Z=np.random.lognormal( 0, 3, size=(n,d))
X=np.zeros((n,d))
for i in range(n):
for j in range(d):
X[i,j]=np.max(C[:,j]*Z[i,:])
lambda1=3+0.5+0.3
optimizer = optim.LBFGS(model.parameters(), lr=0.06)
for t in range(100000):
def closure():
x_pred = model(torch.Tensor(X))
optimizer.zero_grad()
loss = custom_loss(x_pred, torch.Tensor(X))
loss.backward()
for i, layer in enumerate(model.linears):
with torch.no_grad():
model.linears[i].weight.copy_ (model.linears[i].weight.data.clamp(min=0))
with torch.no_grad():
model.final_layer.weight.copy_ (model.final_layer.weight.data.clamp(min=0))
print(loss)
return loss
optimizer.step(closure)
#Testing if true C matrix indeed gives lower penalty
model2=copy.deepcopy(model)
for j,l in enumerate(model2.linears):
for i in range(d):
with torch.no_grad():
l.weight[i]=C[j,i]
for i in range(d-1):
with torch.no_grad():
model2.final_layer.weight[i]=C[d-1,i+1]
x_pred=model2(torch.Tensor(X))
loss = custom_loss(x_pred, torch.Tensor(X))
print("Loss Value for the true C Matrix: ", loss)
You can see that the true loss value is just 1, but the minimum it finds is far away from 1; Let me quickly explain what I am doing;
I define (d-1) layers of size d, one layer of size (d-1) so I have exactly d^2-1 variables;
Then the forward function just tries to calculate X^est based on these layers and weight_contraint() just calculates the sum of all values of the layer;
The rest should be basic; I generate data, I run the pytorch algorithm; In the end, I test if I set up the forward function and the network correctly; I use the true C values to show that it indeed gives error 1;
Any idea how I can properly set up this weight constraint or is it impossible?