Autograd and implementing custom optimizer

I’m trying to implement a custom optimization rule for my neural network. I particularly don’t want to use .backward() method. Here is my optimizer:

class MyOptimizer(optim.Optimizer):

    def __init__(self, params, lr=1e-3):
        defaults = dict(lr=lr)
        super(MyOptimizer, self).__init__(params, defaults)

    def step(self, loss):

        for group in self.param_groups:

            grad = torch.autograd.grad(loss, group['params'], create_graph=True)

            for idx, p in enumerate(group['params']):

                p.grad = grad[idx]

                with torch.no_grad():

                    if p.grad is None:
                    d_p = p.grad

                    p.add_(d_p, alpha=-group['lr'])
  1. Does the step function implement torch.optim.SGD without introducing any new parameters to the computational graph?

  2. Why is there a need for with torch.no_grad() and is its placement correct?

  3. How would using a @torch.no_grad decorator instead of with torch.no_grad() change the model’s behavior?

  4. I’m assuming that this implementation does not need to call self.optimizer.zero_grad() and loss.backward() anymore. Is that correct?

1 Like