# "Training" variables to do SVD

In a bid to get familiar with PyTorch syntax, I thought I’d try and see if I can use gradient descent to do SVD - but not just the standard SVD routine, instead multidimensional scaling (MDS) which requires SVD.

Essentially, I generated a random n x n matrix U, a random diagonal n x n matrix s, and a random n x n matrix Vh, just as a starting point. The goal is for U s Vh.T to approximate some matrix B ie. U s Vh.T ~ SVD(B).

I guess I’m running into two rookie pitfalls: (1) the loss is not updating after the first iteration (why?) (2) is it possible to “combine” two loss functions? Below, I “combine” the loss of the SVD approximation to the loss of the MDS approximation:

D = torch.rand(N, N).float()
H = torch.eye(N).float() - \
1/N * torch.ones(D.shape).float()
B = 0.5 * torch.matmul(torch.matmul(H, D**2), H)
pts = torch.rand(N, 3).float()

# declare constants
stop_loss = 1e-2
step_size = stop_loss / 3

# emulate simulate SVD

# find embedding
embed = torch.matmul(torch.sqrt(s), Vh)
X_hat = embed.T[:, :3] # select the first 3 coordinates --> x, y, z

for i in range(100000):

# calculate loss1: how close is our SVD function?
delta1 = (torch.matmul(U.T, U) - torch.eye(N)) + \
(torch.matmul(V.T, V) - torch.eye(N)) + \
(torch.matmul(torch.matmul(U, s), Vh.T) - B)

L1 = torch.norm(delta1, p=2)

# calculate loss2: how close is our MDS approximation?
delta2 = torch.nan_to_num(X_hat - pts, nan=0.0)

L2 = torch.norm(delta2, p=2)

# Backprop
loss = L1 + L2
loss.backward()

# update

U.data.data.zero_()
s.data.data.zero_()
V.data.data.zero_()

if i % 1000 == 0:
print('Loss is %s at iteration %i' % (loss, i))

if abs(loss) < stop_loss:
break

Soo some hints.
Variable was deprecated couple of years ago
You don’t need to acess .data That just computes in-place modifications of the tensor.
You can define an optimizer and to pass tensors:

Some corrections would be

import torch
N=15
D = torch.rand(N, N).float()
H = torch.eye(N).float() - \
1 / N * torch.ones(D.shape).float()
B = 0.5 * torch.matmul(torch.matmul(H, D ** 2), H)
pts = torch.rand(N, 3).float()

# declare constants
stop_loss = 1e-2
step_size = stop_loss / 3

# emulate simulate SVD

optimizer = torch.optim.SGD([U,s,Vh], lr=step_size)
# find embedding
embed = torch.matmul(torch.sqrt(s), Vh)
X_hat = embed.T[:, :3]  # select the first 3 coordinates --> x, y, z

for i in range(100000):

# calculate loss1: how close is our SVD function?
delta1 = (torch.matmul(U.T, U) - torch.eye(N)) + \
(torch.matmul(V.T, V) - torch.eye(N)) + \
(torch.matmul(torch.matmul(U, s), Vh.T) - B)

L1 = torch.norm(delta1, p=2)

# calculate loss2: how close is our MDS approximation?
delta2 = torch.nan_to_num(X_hat - pts, nan=0.0)

L2 = torch.norm(delta2, p=2)

# Backprop
loss = L1 + L2
loss.backward()

# update
optimizer.step()

if i % 1000 == 0:
print('Loss is %s at iteration %i' % (loss, i))

if abs(loss) < stop_loss:
break

If you want to do your own optimization you should zeroe the gradients each step.
lastly, I don’t know whether this function is backpropagable or not torch.nan_to_num(X_hat - pts, nan=0.0) But I cannot imagine why you would have NaNs

U.data.data.zero_()
s.data.data.zero_()
V.data.data.zero_()

This is making zero the values of U,s and V each iteration. Thus the result will be static cos you are “resetting” the tensor. And yes, you can combine L=L1+L2

1 Like

I see! Thank you so much for pointing this out!

Even when I declare an optimizer to step through like your suggestion above, the loss does not change. Is there a common step I am forgetting here?

import torch

N=15
D = torch.rand(N, N).float()
H = torch.eye(N).float() - \
1 / N * torch.ones(D.shape).float()
B = 0.5 * torch.matmul(torch.matmul(H, D ** 2), H)
# pts = torch.rand(N, 3).float()

# declare constants
stop_loss = 1e-2
step_size = stop_loss / 3

# # emulate simulate SVD
# U = torch.rand(N, N, requires_grad=True)
# Vh = torch.rand(N, N, requires_grad=True)

optimizer = torch.optim.SGD([U,s,Vh], lr=step_size)
# find embedding
# embed = torch.matmul(torch.sqrt(s), Vh)
# X_hat = embed.T[:, :3]  # select the first 3 coordinates --> x, y, z

for i in range(100000):

# calculate loss1: how close is our SVD function?
delta1 = (torch.matmul(U.T, U) - torch.eye(N)) + \
(torch.matmul(Vh.T, Vh) - torch.eye(N)) + \
(torch.matmul(torch.matmul(U, s), Vh.T) - B)

L1 = torch.norm(delta1, p=2)

# calculate loss2: how close is our MDS approximation?
# # delta2 = torch.nan_to_num(X_hat - pts, nan=0.0)

# # L2 = torch.norm(delta2, p=2)

# Backprop
loss = L1
loss.backward()

# update
optimizer.step()

if i % 1000 == 0:
print('Loss is %s at iteration %i' % (loss, i))

if abs(loss) < stop_loss:
break

Output:

Loss is tensor(141.6570, grad_fn=<NormBackward1>) at iteration 0
Loss is tensor(141.6570, grad_fn=<NormBackward1>) at iteration 1000
Loss is tensor(141.6570, grad_fn=<NormBackward1>) at iteration 2000
Loss is tensor(141.6570, grad_fn=<NormBackward1>) at iteration 3000
Loss is tensor(141.6570, grad_fn=<NormBackward1>) at iteration 4000
...
...
etc.

I see your point. Sometimes my numpy array pts have missing rows; my toy exercise is to try and recover those points. It looks something like:

pts
>>>array([[1327.769     ,  922.555     ,   86.56067961],
[          nan,           nan,           nan],
[          nan,           nan,           nan],
[          nan,           nan,           nan],
[          nan,           nan,           nan],
[1327.34660274,  921.29142466,   78.19481833],
[          nan,           nan,           nan],
[          nan,           nan,           nan],
[          nan,           nan,           nan],
[          nan,           nan,           nan],
[          nan,           nan,           nan],
[1316.07260274,  920.69542466,   85.07831347]])

I’m trying to set the loss on those missing rows to zero (ie. torch.nan_to_num(X_hat - pts, nan=0.0), while taking the real L2 norm on those non-missing values. Is there a function that allows PyTorch to consider nan values in its loss as 0?

So if the gradient is not NaN or None I guess it works for any value but those ones.
Could you manage to get the tensors updated?

Doesn’t seem like it. In 2 comments above, I commented out the second loss function, and only calculated the first to avoid this problem. But still, the loss remains constant!

Step should be called before zeroing the grad

optimizer.step()
import torch
import matplotlib.pyplot as plt

N=5
D = torch.rand(N, N).float()
H = torch.eye(N).float() - \
1 / N * torch.ones(D.shape).float()
B = 0.5 * torch.matmul(torch.matmul(H, D ** 2), H)
pts = torch.rand(N, 3).float()

# declare constants
stop_loss = 1e-2
step_size = stop_loss / 3

# # emulate simulate SVD

optimizer = torch.optim.SGD([U,s,Vh], lr=step_size)
# find embedding
embed = torch.matmul(torch.sqrt(s), Vh)
print(embed)
X_hat = embed.T[:, :3]  # select the first 3 coordinates --> x, y, z
loss_hist = []
for i in range(10):
print(f'Iteration {i}:')
print(f'\t U_s:{U.sum()}, s:{s.sum()}, Vh_s:{Vh.sum()}')
# calculate loss1: how close is our SVD function?
delta1 = (torch.matmul(U.T, U) - torch.eye(N)) + \
(torch.matmul(Vh.T, Vh) - torch.eye(N)) + \
(torch.matmul(torch.matmul(U, s), Vh.T) - B)

L1 = torch.norm(delta1, p=2)

# calculate loss2: how close is our MDS approximation?
delta2 = X_hat - pts
#
L2 = torch.norm(delta2, p=2)
print(f'\t L1: {L1}, L2: {L2}')
# Backprop
loss = L1
loss.backward()
loss_hist.append(loss.item())
# update

Still if I add L2 there are NaNs but guess that’s about the maths (which i didn’t check)

1 Like

I see…! The order matters a lot. Thank you for this feedback, I learned a lot.

1 Like