class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.fc1 = nn.Linear(128, 128, bias=False)
def forward(self, x):
x = self.fc1(x)
return x
x = torch.randn((128, 128))
y = torch.randn((128, 128))
U = torch.linalg.lstsq(x, y).solution
pred = x @ U
print(f'norm: {torch.norm(pred - y)}')
net = Net()
optimizer = torch.optim.Adam(net.parameters(), lr=1e-3)
for i in range(1000):
pred = net(x)
loss = torch.norm(pred - y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
print(f'loss: {loss.item()}')

I initialize two random vectors x and y and try to find transformation matrix U such that x @ U = y.
The solution with the least squares obtained a low norm (loss). Then I try to solve the same problem by learning U, however I cant get it to converge to a low loss and the estimation U learned poorly perform. Is there any idea how can I learn a better estimation of U using learning algorithms? I also tried to define the forward pass of Net as follows:

def forward(self, x):
x = x @ self.fc1.weight
return x

Your x and y vectors are sampled independently. Therefore, on a
statistical basis, they have nothing to do with one another, so you can’t
really regress one against the other.

For your specific values of x and y that you have sampled, some of the
statistical variations they have will happen – just by chance – to line up,
so you will be able to regress the concrete values of y against x, but it
will be hard to do (and if you did so again with a new set of samples for x and y, you would get entirely different values for the regression
coefficients).

A consequence of this is that your specific concrete regression problem
is ill-conditioned, as can be seen from the large condition number of
your matrix U.

I’ve tweaked and expanded the code you posted to illustrate what is
going on. First, you can train, but doing so it difficult and slow.

I then construct a rather similar regression problem where the tensor tdoes depend on the tensor s, so the regression problem makes
sense and is not ill-conditioned. Your Net model trains just fine on
this regression problem and produces an accurate result.

Here is the example script:

import torch
print (torch.__version__)
_ = torch.manual_seed (2023)
class Net (torch.nn.Module):
def __init__ (self):
super (Net, self).__init__()
self.fc1 = torch.nn.Linear (128, 128, bias = False)
def forward (self, x):
x = self.fc1 (x)
return x
print ('original ill-conditioned regression')
x = torch.randn ((128, 128))
y = torch.randn ((128, 128))
U = torch.linalg.lstsq (x, y).solution
pred = x @ U
print (f'norm: {torch.norm (pred - y)}')
print (f'cond (U): {torch.linalg.cond (U)}') # large condition number -- problem is ill-conditioned
genericV = torch.randn (128, 128)
print (f'cond (genericV): {torch.linalg.cond (genericV)}') # "generic" value of condition number
net = Net()
# optimizer = torch.optim.Adam (net.parameters(), lr = 1e-3)
optimizer = torch.optim.Adam (net.parameters(), lr = 2e-3)
for i in range (1000001):
pred = net (x)
loss = torch.norm (pred - y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
if i % 100000 == 0:
diff = (net.fc1.weight.T - U).abs().mean()
print ('i: %7d loss: %8.3f cond (weight): %9.2f diff: %9.3e' % (i, loss.item(), torch.linalg.cond (net.fc1.weight), diff))
# create non-ill-conditioned regression
print ('non-ill-conditioned regression -- because of noise, loss will not go to zero')
s = torch.randn (128, 128)
t = s @ genericV
# add some noise
s = s * (1.0 + 0.001 * torch.randn (128, 128))
t = t * (1.0 + 0.001 * torch.randn (128, 128))
V = torch.linalg.lstsq (s, t).solution
pred = s @ V
print (f'norm: {torch.norm (pred - t)}') # norm (loss) is not zero
print (f'cond (V): {torch.linalg.cond (V)}') # problem is not ill-conditioned
net = Net()
optimizer = torch.optim.Adam (net.parameters(), lr = 2e-4)
for i in range (200001):
pred = net (s)
loss = torch.norm (pred - t)
optimizer.zero_grad()
loss.backward()
optimizer.step()
if i % 20000 == 0:
diff = (net.fc1.weight.T - V).abs().mean()
print ('i: %7d loss: %8.3f cond (weight): %9.2f diff: %9.3e' % (i, loss.item(), torch.linalg.cond (net.fc1.weight), diff))

Hi, many thanks for your answer! It was very helpful. The code I attached here is an example(and now I understand why it was a bad example). What I am trying to learn is a U that will map between weights matrices of the same model that was trained with different seeds, and I can’t get the loss down. When I now look at the condition number of the U matrix obtained by least squares, I get a very high value. So you are actually saying the condition number of least squares U indicates the stability of the learning process? Is there anything that can be done in-order to improve stability?