I want to inspect the growing of the params of each step. The value that I want is not exactly gradients which are possible to grab using ‘param.grad’ because if SGD with momentum is applied, the updated value will be
lr * (momentum term 90% + gradient only 10%)
, and I want to have this aggregated value.
I know that it is possible to do by calculating the different between params of current step and previous step. But in fact, this value is calculated by the optimizer function, optimizer.step(), as shown in figure below:
I want to get ‘d_p’ value from second last line, p.data.add_(-group[‘lr’], d_p), because it is the real value after calculated by optimizer that is updated to the network parameters. Is there any way to get it correctly? Unless I need to override the step function by myself.
Given that this is very specific to SGD, this is not part of the general optimizer interface.
You will have to modify the
.step() function to return what you need.
I did inherit the ‘optim.SGD’ class and override the ‘step’ function. This is what I have done.
def step(self, closure=None):
"""Performs a single optimization step.
closure (callable, optional): A closure that reevaluates the model
and returns the loss.
loss = None
if closure is not None:
loss = closure()
d_ps_groups = 
for group in self.param_groups:
weight_decay = group['weight_decay']
momentum = group['momentum']
dampening = group['dampening']
nesterov = group['nesterov']
d_ps = 
for p in group['params']:
if p.grad is None:
d_p = p.grad.data
if weight_decay != 0:
if momentum != 0:
param_state = self.state[p]
if 'momentum_buffer' not in param_state:
buf = param_state['momentum_buffer'] = torch.zeros_like(p.data)
buf = param_state['momentum_buffer']
buf.mul_(momentum).add_(1 - dampening, d_p)
d_p = d_p.add(momentum, buf)
d_p = buf
return loss, d_ps_groups
Thank you very much.