Newbie question: Grad set to None in second optimizer pass

Henri_Koelewijn · June 3, 2018, 9:22am

I am implementing a manual gradient descend to understand gradients better.

class Optimizer:
    def __init__(self, m, b, learning_rate):
        self.m = m
        self.b = b
        self.learning_rate = learning_rate
        print (self.m, self.b)
        
    def calculate_error(self, x_data, y_label):        
        error = 0
        for x, y in zip(x_data, y_label):
            y_hat = self.m * x + self.b
            error = error + ((y_hat - y)**2)
        return error

    def optimize(self):
        loss = self.calculate_error(x_data_source, y_label_source)
        loss.backward()
        
        print (self.m.grad, self.b.grad)
        self.m = self.m - self.m.grad * self.learning_rate
        self.b = self.b - self.b.grad * self.learning_rate
        print (self.m.requires_grad)
        
        #self.m.grad.data.zero_()
        #self.b.grad.data.zero_()
        return loss

Running the Optimizer once, results in expected results, the second time I get the error:

---> 21         self.m = self.m - self.m.grad * self.learning_rate
TypeError: unsupported operand type(s) for *: 'NoneType' and 'float'

The console output of the print statements indicate the requires grad stays True, but stll it is not recalculated

tensor([-727.3233]) tensor([-94.7521])
True
None None

It look like right after the recalculation of the variables (m and b) their grad is reset to None

What am I missing here

ptrblck · June 3, 2018, 11:39am

Try to change your weight update code to:

with torch.no_grad():
    self.m -= self.m.grad * self.learning_rate
    self.b -= self.b.grad * self.learning_rate

We would like to mutate the values in-place without building a new computation graph, so you should use the torch.no_grad context manager.

ptrblck · June 3, 2018, 12:52pm

Alternatively, you could update the underlying .data.
It’s usually not recommended, but would also work:

self.m.data -= self.m.grad * self.learning_rate
self.b.data -= self.b.grad * self.learning_rate

Henri_Koelewijn · June 3, 2018, 1:33pm

Nice, thanks! It works now.
For any interested:
I tried the in place variant too, but without the no_grad context. That gave me an error that in place is not allowed.
And by the way, I had to zero the gradient after each step:

        self.m.grad.data.zero_()
        self.b.grad.data.zero_()

Anyone have any idea why the context manager is called no_grad(). It sounds like no gradients are begin computed, but what I understand from you it makes sure the graph is not recalculated.
I find the naming in ML libraries I encountered pretty cryptic at times, but that might be caused by me being new to this