I’m looking to do a simple optimization, similar to the collaborative learning example that Jeremy Howard (fast.ai) showed in his Deep Learning MOOC (Lesson 4 @ 1:08)
Given blockData.shape=(20,14)
filled with random numbers, I want to start with two matrices also filled with random numbers, with shapes vert.shape=(20,10)
and hori.shape=(10,14)
such that at the end of the optimization I minimize the quantity:
mse( blockData - dot(vert,hori) )
by modifying hori and vert simultaneously using autograd.
If I write the backward()
call
loss = mse(tensor(numpy.dot(vert,hori)),blockData)
loss.backward()
I get an error RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
which, based on my internet searching, is most commonly seen in cases when the .requires_grad
isn’t set to True.
It appears I have a case where the gradient isn’t computed because (maybe) there isn’t a variable to set requires_grad=True
for.
Two main questions are:
- How can I display the contents of
tensors
so I can see what element 0 is referring to? - Can autograd do what I’m asking it to? And if so, how?
Here is the code I have, which isn’t what I’m looking for, but is close, and it runs without error:
from fastai.basics import tensor, nn
import torch, numpy, pandas
def hypothesis(vert,hori):
return numpy.dot(vert,hori)
def mse(y_hat,y):
return ((y_hat-y)**2).mean()
def update(y_hat):
# perform gradient descent
loss = mse(y_hat,blockData)
loss.backward()
if t%10 == 0:
print(t,'-------------',loss)
with torch.no_grad():
y_hat.sub_(lr * y_hat.grad)
y_hat.grad.zero_()
vecSize = 10
shape = (20,14)
# random large block of data
blockData = tensor(numpy.random.random_sample(shape))
hori = nn.Parameter(tensor(numpy.random.random_sample((vecSize,shape[1]))))
vert = nn.Parameter(tensor(numpy.random.random_sample((shape[0],vecSize))))
lr = 1e-1
y_hat = tensor(hypothesis(vert,hori))
y_hat.requires_grad_(True)
for t in range(101):
update(y_hat)
The code above will drive y_hat
to the correct answer, but that’s not really the idea, since what I’m trying to do is to get vert
and hori
to drive towards the correct answers as part of the same optimization. As it is written now I’m just taking the gradient from the hypothesis matrix (y_hat
) to the target (blockData
), which is a trivial optimization.