I’m trying to implement a custom optimization rule for my neural network. I particularly don’t want to use .backward() method. Here is my optimizer:
class MyOptimizer(optim.Optimizer):
def __init__(self, params, lr=1e3):
defaults = dict(lr=lr)
super(MyOptimizer, self).__init__(params, defaults)
def step(self, loss):
for group in self.param_groups:
grad = torch.autograd.grad(loss, group['params'], create_graph=True)
for idx, p in enumerate(group['params']):
p.grad = grad[idx]
with torch.no_grad():
if p.grad is None:
continue
d_p = p.grad
p.add_(d_p, alpha=group['lr'])

Does the
step
function implementtorch.optim.SGD
without introducing any new parameters to the computational graph? 
Why is there a need for
with torch.no_grad()
and is its placement correct? 
How would using a
@torch.no_grad
decorator instead ofwith torch.no_grad()
change the model’s behavior? 
I’m assuming that this implementation does not need to call
self.optimizer.zero_grad()
andloss.backward()
anymore. Is that correct?