# "Training" variables to do SVD

In a bid to get familiar with PyTorch syntax, I thought I’d try and see if I can use gradient descent to do SVD - but not just the standard SVD routine, instead multidimensional scaling (MDS) which requires SVD.

Essentially, I generated a random `n x n` matrix `U`, a random diagonal `n x n` matrix `s`, and a random `n x n` matrix `Vh`, just as a starting point. The goal is for `U s Vh.T` to approximate some matrix `B` ie. `U s Vh.T ~ SVD(B)`.

I guess I’m running into two rookie pitfalls: (1) the loss is not updating after the first iteration (why?) (2) is it possible to “combine” two loss functions? Below, I “combine” the loss of the SVD approximation to the loss of the MDS approximation:

``````# load into pytorch
D = torch.rand(N, N).float()
H = torch.eye(N).float() - \
1/N * torch.ones(D.shape).float()
B = 0.5 * torch.matmul(torch.matmul(H, D**2), H)
pts = torch.rand(N, 3).float()

# declare constants
stop_loss = 1e-2
step_size = stop_loss / 3

# emulate simulate SVD

# find embedding
embed = torch.matmul(torch.sqrt(s), Vh)
X_hat = embed.T[:, :3] # select the first 3 coordinates --> x, y, z

for i in range(100000):

# calculate loss1: how close is our SVD function?
delta1 = (torch.matmul(U.T, U) - torch.eye(N)) + \
(torch.matmul(V.T, V) - torch.eye(N)) + \
(torch.matmul(torch.matmul(U, s), Vh.T) - B)

L1 = torch.norm(delta1, p=2)

# calculate loss2: how close is our MDS approximation?
delta2 = torch.nan_to_num(X_hat - pts, nan=0.0)

L2 = torch.norm(delta2, p=2)

# Backprop
loss = L1 + L2
loss.backward()

# update

U.data.data.zero_()
s.data.data.zero_()
V.data.data.zero_()

if i % 1000 == 0:
print('Loss is %s at iteration %i' % (loss, i))

if abs(loss) < stop_loss:
break
``````

Soo some hints.
Variable was deprecated couple of years ago You don’t need to acess `.data` That just computes in-place modifications of the tensor.
You can define an optimizer and to pass tensors:

Some corrections would be

``````import torch
N=15
D = torch.rand(N, N).float()
H = torch.eye(N).float() - \
1 / N * torch.ones(D.shape).float()
B = 0.5 * torch.matmul(torch.matmul(H, D ** 2), H)
pts = torch.rand(N, 3).float()

# declare constants
stop_loss = 1e-2
step_size = stop_loss / 3

# emulate simulate SVD

optimizer = torch.optim.SGD([U,s,Vh], lr=step_size)
# find embedding
embed = torch.matmul(torch.sqrt(s), Vh)
X_hat = embed.T[:, :3]  # select the first 3 coordinates --> x, y, z

for i in range(100000):

# calculate loss1: how close is our SVD function?
delta1 = (torch.matmul(U.T, U) - torch.eye(N)) + \
(torch.matmul(V.T, V) - torch.eye(N)) + \
(torch.matmul(torch.matmul(U, s), Vh.T) - B)

L1 = torch.norm(delta1, p=2)

# calculate loss2: how close is our MDS approximation?
delta2 = torch.nan_to_num(X_hat - pts, nan=0.0)

L2 = torch.norm(delta2, p=2)

# Backprop
loss = L1 + L2
loss.backward()

# update
optimizer.step()

if i % 1000 == 0:
print('Loss is %s at iteration %i' % (loss, i))

if abs(loss) < stop_loss:
break
``````

If you want to do your own optimization you should zeroe the gradients each step.
lastly, I don’t know whether this function is backpropagable or not `torch.nan_to_num(X_hat - pts, nan=0.0) ` But I cannot imagine why you would have NaNs

``````    U.data.data.zero_()
s.data.data.zero_()
V.data.data.zero_()
``````

This is making zero the values of U,s and V each iteration. Thus the result will be static cos you are “resetting” the tensor. And yes, you can combine L=L1+L2

1 Like

I see! Thank you so much for pointing this out!

Even when I declare an optimizer to step through like your suggestion above, the loss does not change. Is there a common step I am forgetting here?

``````import torch

N=15
D = torch.rand(N, N).float()
H = torch.eye(N).float() - \
1 / N * torch.ones(D.shape).float()
B = 0.5 * torch.matmul(torch.matmul(H, D ** 2), H)
# pts = torch.rand(N, 3).float()

# declare constants
stop_loss = 1e-2
step_size = stop_loss / 3

# # emulate simulate SVD
# U = torch.rand(N, N, requires_grad=True)
# Vh = torch.rand(N, N, requires_grad=True)

optimizer = torch.optim.SGD([U,s,Vh], lr=step_size)
# find embedding
# embed = torch.matmul(torch.sqrt(s), Vh)
# X_hat = embed.T[:, :3]  # select the first 3 coordinates --> x, y, z

for i in range(100000):

# calculate loss1: how close is our SVD function?
delta1 = (torch.matmul(U.T, U) - torch.eye(N)) + \
(torch.matmul(Vh.T, Vh) - torch.eye(N)) + \
(torch.matmul(torch.matmul(U, s), Vh.T) - B)

L1 = torch.norm(delta1, p=2)

# calculate loss2: how close is our MDS approximation?
# # delta2 = torch.nan_to_num(X_hat - pts, nan=0.0)

# # L2 = torch.norm(delta2, p=2)

# Backprop
loss = L1
loss.backward()

# update
optimizer.step()

if i % 1000 == 0:
print('Loss is %s at iteration %i' % (loss, i))

if abs(loss) < stop_loss:
break
``````

Output:

``````Loss is tensor(141.6570, grad_fn=<NormBackward1>) at iteration 0
Loss is tensor(141.6570, grad_fn=<NormBackward1>) at iteration 1000
Loss is tensor(141.6570, grad_fn=<NormBackward1>) at iteration 2000
Loss is tensor(141.6570, grad_fn=<NormBackward1>) at iteration 3000
Loss is tensor(141.6570, grad_fn=<NormBackward1>) at iteration 4000
...
...
etc.
``````

I see your point. Sometimes my numpy array `pts` have missing rows; my toy exercise is to try and recover those points. It looks something like:

``````pts
>>>array([[1327.769     ,  922.555     ,   86.56067961],
[          nan,           nan,           nan],
[          nan,           nan,           nan],
[          nan,           nan,           nan],
[          nan,           nan,           nan],
[1327.34660274,  921.29142466,   78.19481833],
[          nan,           nan,           nan],
[          nan,           nan,           nan],
[          nan,           nan,           nan],
[          nan,           nan,           nan],
[          nan,           nan,           nan],
[1316.07260274,  920.69542466,   85.07831347]])

``````

I’m trying to set the loss on those missing rows to zero (ie. `torch.nan_to_num(X_hat - pts, nan=0.0)`, while taking the real L2 norm on those non-missing values. Is there a function that allows PyTorch to consider `nan` values in its loss as 0?

So if the gradient is not NaN or None I guess it works for any value but those ones.
Could you manage to get the tensors updated?

Doesn’t seem like it. In 2 comments above, I commented out the second loss function, and only calculated the first to avoid this problem. But still, the loss remains constant!

Sorry, my bad Step should be called before zeroing the grad ``````    optimizer.zero_grad()
optimizer.step()
``````
``````import torch
import matplotlib.pyplot as plt

N=5
D = torch.rand(N, N).float()
H = torch.eye(N).float() - \
1 / N * torch.ones(D.shape).float()
B = 0.5 * torch.matmul(torch.matmul(H, D ** 2), H)
pts = torch.rand(N, 3).float()

# declare constants
stop_loss = 1e-2
step_size = stop_loss / 3

# # emulate simulate SVD

optimizer = torch.optim.SGD([U,s,Vh], lr=step_size)
# find embedding
embed = torch.matmul(torch.sqrt(s), Vh)
print(embed)
X_hat = embed.T[:, :3]  # select the first 3 coordinates --> x, y, z
loss_hist = []
for i in range(10):
print(f'Iteration {i}:')
print(f'\t U_s:{U.sum()}, s:{s.sum()}, Vh_s:{Vh.sum()}')
# calculate loss1: how close is our SVD function?
delta1 = (torch.matmul(U.T, U) - torch.eye(N)) + \
(torch.matmul(Vh.T, Vh) - torch.eye(N)) + \
(torch.matmul(torch.matmul(U, s), Vh.T) - B)

L1 = torch.norm(delta1, p=2)

# calculate loss2: how close is our MDS approximation?
delta2 = X_hat - pts
#
L2 = torch.norm(delta2, p=2)
print(f'\t L1: {L1}, L2: {L2}')
# Backprop
loss = L1
loss.backward()
loss_hist.append(loss.item())
# update
``````

Still if I add L2 there are NaNs but guess that’s about the maths (which i didn’t check)

1 Like

I see…! The order matters a lot. Thank you for this feedback, I learned a lot.

1 Like