Autograd and implementing custom optimizer

blade · October 11, 2021, 4:57pm

I’m trying to implement a custom optimization rule for my neural network. I particularly don’t want to use .backward() method. Here is my optimizer:

class MyOptimizer(optim.Optimizer):

    def __init__(self, params, lr=1e-3):
        defaults = dict(lr=lr)
        super(MyOptimizer, self).__init__(params, defaults)

    def step(self, loss):

        for group in self.param_groups:

            grad = torch.autograd.grad(loss, group['params'], create_graph=True)

            for idx, p in enumerate(group['params']):

                p.grad = grad[idx]

                with torch.no_grad():

                    if p.grad is None:
                        continue
                    d_p = p.grad

                    p.add_(d_p, alpha=-group['lr'])

Does the step function implement torch.optim.SGD without introducing any new parameters to the computational graph?
Why is there a need for with torch.no_grad() and is its placement correct?
How would using a @torch.no_grad decorator instead of with torch.no_grad() change the model’s behavior?
I’m assuming that this implementation does not need to call self.optimizer.zero_grad() and loss.backward() anymore. Is that correct?