I’m looking to do a simple optimization, similar to the collaborative learning example that Jeremy Howard (fast.ai) showed in his Deep Learning MOOC (Lesson 4 @ 1:08)

Given `blockData.shape=(20,14)`

filled with random numbers, I want to start with two matrices also filled with random numbers, with shapes `vert.shape=(20,10)`

and `hori.shape=(10,14)`

such that at the end of the optimization I minimize the quantity:

```
mse( blockData - dot(vert,hori) )
```

by modifying hori and vert simultaneously using autograd.

If I write the `backward()`

call

```
loss = mse(tensor(numpy.dot(vert,hori)),blockData)
loss.backward()
```

I get an error `RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn`

which, based on my internet searching, is most commonly seen in cases when the `.requires_grad`

isn’t set to True.

It appears I have a case where the gradient isn’t computed because (maybe) there isn’t a variable to set `requires_grad=True`

for.

Two main questions are:

- How can I display the contents of
`tensors`

so I can see what element 0 is referring to? - Can autograd do what I’m asking it to? And if so, how?

Here is the code I have, which isn’t what I’m looking for, but is close, and it runs without error:

```
from fastai.basics import tensor, nn
import torch, numpy, pandas
def hypothesis(vert,hori):
return numpy.dot(vert,hori)
def mse(y_hat,y):
return ((y_hat-y)**2).mean()
def update(y_hat):
# perform gradient descent
loss = mse(y_hat,blockData)
loss.backward()
if t%10 == 0:
print(t,'-------------',loss)
with torch.no_grad():
y_hat.sub_(lr * y_hat.grad)
y_hat.grad.zero_()
vecSize = 10
shape = (20,14)
# random large block of data
blockData = tensor(numpy.random.random_sample(shape))
hori = nn.Parameter(tensor(numpy.random.random_sample((vecSize,shape[1]))))
vert = nn.Parameter(tensor(numpy.random.random_sample((shape[0],vecSize))))
lr = 1e-1
y_hat = tensor(hypothesis(vert,hori))
y_hat.requires_grad_(True)
for t in range(101):
update(y_hat)
```

The code above will drive `y_hat`

to the correct answer, but that’s not really the idea, since what I’m trying to do is to get `vert`

and `hori`

to drive towards the correct answers as part of the same optimization. As it is written now I’m just taking the gradient from the hypothesis matrix (`y_hat`

) to the target (`blockData`

), which is a trivial optimization.