Hi all! I want to try a weird idea: using gradient descent to make one normalized matrix * A* be ‘closer’ to another normalized matrix

**, i.e., minimize the mean square error between**

*B***(can be viewed as the**

*A**output*) and

**(can be viewed as the**

*B**target*). So I write the following toy code:

```
import torch
from torch.autograd import Variable
import numpy as np
np.random.seed(0)
torch.manual_seed(0)
a = np.random.randint(0, 255, (5, 5)).astype(np.float32)
a = Variable(torch.from_numpy(a), requires_grad=True)
b = np.random.randint(0, 255, (5, 5)).astype(np.float32)
b = Variable(torch.from_numpy(b))
for _ in range(10):
a_max = a.max().repeat(a.size())
c = a / a_max
b_max = b.max().repeat(b.size())
d = b / b_max
loss = torch.nn.MSELoss()(c, d)
print(loss.data[0])
loss.backward()
a.data -= 1000000 * a.grad.data
```

Unfortunately, I encounter a **runtimeError: can’t assign a FloatTensor to a scalar value of type float.**

However, it seems that everything work well when I detach the node **a_max** from the current graph,:

```
import torch
from torch.autograd import Variable
import numpy as np
np.random.seed(0)
torch.manual_seed(0)
a = np.random.randint(0, 255, (5, 5)).astype(np.float32)
a = Variable(torch.from_numpy(a), requires_grad=True)
b = np.random.randint(0, 255, (5, 5)).astype(np.float32)
b = Variable(torch.from_numpy(b))
for _ in range(10):
a_max = a.max().repeat(a.size()).detach()
c = a / a_max
b_max = b.max().repeat(b.size())
d = b / b_max
loss = torch.nn.MSELoss()(c, d)
print(loss.data[0])
loss.backward()
a.data -= 1000000 * a.grad.data
# now c and b are close enough
print(c.data)
print(d.data)
```

I have no idea why detach that node make the program executable. I would appreciate it if you can point me out what cause the runtime error. Thanks a lot!