How to use torch.optim.lr_scheduler.ExponentialLR?

I am trying to train a LSTM model in a NLP problem.

I want to use learning rate decay with the torch.optim.lr_scheduler.ExponentialLR class, yet I seem to fail to use it correctly.
My code:

optimizer = torch.optim.Adam(dual_encoder.parameters(), lr = 0.001)

scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma = 0.95)

for epoch in range(200):

context_id_list, response_id_list, label_array = load_ids_and_labels(dataframe, word_to_id)

loss_acc = 0

for i in range(len(label_array)):

    context = autograd.Variable(torch.LongTensor(context_id_list[i]).view(len(context_id_list[i]),1), requires_grad = False)
    
    response = autograd.Variable(torch.LongTensor(response_id_list[i]).view(len(response_id_list[i]), 1), requires_grad = False)
    
    label = autograd.Variable(torch.FloatTensor(torch.from_numpy(np.array(label_array[i]).reshape(1,1))), requires_grad = False)
    
    score = dual_encoder(context, response)

    loss = torch.nn.functional.binary_cross_entropy_with_logits(score, label) 
    
    loss_acc += loss.data[0]
   
    loss.backward()

    scheduler.step() #?
    
    optimizer.zero_grad()
    
print("Epoch: ", epoch, ", Loss: ", (loss_acc/len(label_array)))

If I do it like this, according to one PyTorch example, the parameters do not get updated. If I use optimizer.step() instead, the scheduler will not even be applied, as far as I understand.

I would be happy about some code example!
Thanks!

2 Likes

You should add optimizer.step() into your training loop and move scheduler.step() into the epoch loop.

9 Likes

Seems to work, thanks!

What if I want to have the learning schedule defined on number of iterations instead of number of epochs? Can I still put by schedule.step() inside the inner loop?

Sure, you could guard it with a condition.
E.g. if you would like to call it every 100 iterations:

for epoch in range(epochs):
    for batch_idx, (data, target) in enumerate(data_loader):
        # Your training routine
        data = ...

        if (batch_idx+1) % 100 == 0:
            scheduler.step()
8 Likes

Thanks! Here’s some more code…

        lstm = DSARNN(input_dims, sequence_length, cell_size)
        criterion = torch.nn.MSELoss()
        #optimiser = torch.optim.Adagrad(lstm.parameters(), lr=0.01)
        optimiser = torch.optim.SGD(lstm.parameters(), lr=0.1)
        scheduler = torch.optim.lr_scheduler.StepLR(optimiser, step_size=3, gamma=0.1)

        # register hooks

        lstm.softmax.register_forward_hook(monitorAttention)

        # init Tensorboard
        tensorboard_step = 0
        writer = SummaryWriter(comment="LSTM Cell + input attention, entropy " + str(bc.entropy()).replace('.', '_'))

        for epoch in range(40):

            scheduler.step(epoch)

            # train
            for minibatch, target in tqdm(Batches(train, target_input), total=len(train)):

                    optimiser.zero_grad()
                    output = lstm(minibatch)
                    loss = criterion(output, target)
                    loss.backward()
                    tensorboard_step += 1
                    writer.add_scalar('training loss', loss, tensorboard_step)
                    writer.add_scalar('learning rate', get_learning_rate(optimiser), tensorboard_step)
                    loss = optimiser.step()
1 Like

scheduler.step() should be called after optimiser.step()

2 Likes

thank you sooo much! :slight_smile:

1 Like

Welcome. Best of luck with your project.

1 Like

Use scheduler.step() instead of scheduler.step(epoch). It has strange behavious when using MultiStepLR. Though it works fine for StepLR in your example.

Ref: https://discuss.pytorch.org/t/whats-the-difference-between-scheduler-step-and-scheduler-step-epoch/73054

1 Like

Is there a cleaner way for doing this for ExponentialLR than the below code:

args.learning_rate = 0.001 i.e. start_lr
args.learning_rate_decay_factor = 0.96
args.learning_rate_decay_step = 3000

optim = torch.optim.Adam(params=model.parameters(), lr=args.learning_rate)

lr_scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer=optim, gamma=args.learning_rate_decay_factor,last_epoch=-1)

for epoch in (range(1, args.num_epoch + 1)):
   
    # Do Forward pass here
    
    optim.zero_grad()
    loss.backward()
    optim.step()
  
    if epoch % args.learning_rate_decay_step == 0:
        lr_scheduler.step()
    # debugging purpose
    print(lr_scheduler.get_last_lr()) # will print last learning rate.

The idea is to take care of decay_step in Scheduler itself, rather than adding an if condition during training.
Like in Tensorflow

tf.compat.v1.train.exponential_decay(    learning_rate,    global_step,    decay_steps,    decay_rate,    staircase=False,    name=None)

Ref: tf.compat.v1.train.exponential_decay  |  TensorFlow v2.9.1