Print current learning rate of the Adam Optimizer?

At the beginning of a training session, the Adam Optimizer takes quiet some time, to find a good learning rate. I would like to accelerate my training by starting a training with the learning rate, Adam adapted to, within the last training session.
Therefore, I would like to print out the current learning rate, Pytorchs Adam Optimizer adapts to, during a training session.
thanks for your help :slight_smile:

for param_group in optimizer.param_groups:
print(param_group[‘lr’])

should do the job

6 Likes

There is a paper that proposes tuning the learning rate using the gradient of the update rule with respect to the learning rate itself. Basically, it dynamically learns the learning rate during training.

If you are interested, here is the paper https://arxiv.org/abs/1703.04782 and my implementation https://github.com/jpeg729/pytorch_bits/blob/master/optim/adam_hd.py#L67

4 Likes

thanks for the quick response, but unfortunately, this command just prints out the initial learning rate and not the current adapted one. I would really surprise me, if it wasn’t possible, get the adapted learning rate somehow…

1 Like

thank you, I will look into it!

this method print the currently used learning rate by the optimizer.

at the moment I am just using two input images for my training. When I start a training session with the network, pretrained by me, the error increases by some magnitudes (from a few hundred to 10.000 up to 40.000) and commutes than back to the level, where it was at the end of the last session. Through all this the learning rate printed out on the console is always the same, initial one, what makes ne sense to me.
I don’t know what else could be the reason for this big temporal fluctuation of the error.

Did you save the optimizer state with the model?
Different optimizers tend to find different solutions so changing optimizers or resetting their state can perturbe training. That is why it can be important to not only save the model parameters but also the optimizer state.

Of course, in this case, that might have nothing to do with the fluctuation of the error that you are seeing.

Adam has a separate learning rate for each parameter. The param_group['lr'] is a kind of base learning rate that does not change. There is no variable in the PyTorch Adam implementation that stores the dynamic learning rates.

One could save the optimizer state, as mentioned here:

The PyTorch implementation of Adam can be found here:
https://pytorch.org/docs/stable/_modules/torch/optim/adam.html

The line for p in group['params']: iterates over all the parameters and calculates the learning rate for each parameter.

I found this helpful: http://ruder.io/optimizing-gradient-descent/index.html#adam

6 Likes

Sorry guys, but these comments seem very misleading!
Optimizers have a fixed learning rate for all parameters. param_group['lr'] would allow you to set a different LR for each layer of the network, but it’s generally not used very often, and most people have 1 single LR for the whole nn.
What Adam does is to save a running average of the gradients for each parameter (not a LR!). The learning rate is still the same throughout the whole training, unless you use a lr_scheduler like CosineAnnealingLR etc.

5 Likes

also you might want to look into : ReduceLROnPlateau in lrScheduler