Hi all! I want to try a weird idea: using gradient descent to make one normalized matrix A be ‘closer’ to another normalized matrix B, i.e., minimize the mean square error between A (can be viewed as the output) and B (can be viewed as the target). So I write the following toy code:
import torch
from torch.autograd import Variable
import numpy as np
np.random.seed(0)
torch.manual_seed(0)
a = np.random.randint(0, 255, (5, 5)).astype(np.float32)
a = Variable(torch.from_numpy(a), requires_grad=True)
b = np.random.randint(0, 255, (5, 5)).astype(np.float32)
b = Variable(torch.from_numpy(b))
for _ in range(10):
a_max = a.max().repeat(a.size())
c = a / a_max
b_max = b.max().repeat(b.size())
d = b / b_max
loss = torch.nn.MSELoss()(c, d)
print(loss.data[0])
loss.backward()
a.data -= 1000000 * a.grad.data
Unfortunately, I encounter a runtimeError: can’t assign a FloatTensor to a scalar value of type float.
However, it seems that everything work well when I detach the node a_max from the current graph,:
import torch
from torch.autograd import Variable
import numpy as np
np.random.seed(0)
torch.manual_seed(0)
a = np.random.randint(0, 255, (5, 5)).astype(np.float32)
a = Variable(torch.from_numpy(a), requires_grad=True)
b = np.random.randint(0, 255, (5, 5)).astype(np.float32)
b = Variable(torch.from_numpy(b))
for _ in range(10):
a_max = a.max().repeat(a.size()).detach()
c = a / a_max
b_max = b.max().repeat(b.size())
d = b / b_max
loss = torch.nn.MSELoss()(c, d)
print(loss.data[0])
loss.backward()
a.data -= 1000000 * a.grad.data
# now c and b are close enough
print(c.data)
print(d.data)
I have no idea why detach that node make the program executable. I would appreciate it if you can point me out what cause the runtime error. Thanks a lot!