Janinanu
(Janina Nuber)
January 17, 2018, 2:43pm
1
I am trying to train a LSTM model in a NLP problem.
I want to use learning rate decay with the torch.optim.lr_scheduler.ExponentialLR class, yet I seem to fail to use it correctly.
My code:
optimizer = torch.optim.Adam(dual_encoder.parameters(), lr = 0.001)
scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma = 0.95)
for epoch in range(200):
context_id_list, response_id_list, label_array = load_ids_and_labels(dataframe, word_to_id)
loss_acc = 0
for i in range(len(label_array)):
context = autograd.Variable(torch.LongTensor(context_id_list[i]).view(len(context_id_list[i]),1), requires_grad = False)
response = autograd.Variable(torch.LongTensor(response_id_list[i]).view(len(response_id_list[i]), 1), requires_grad = False)
label = autograd.Variable(torch.FloatTensor(torch.from_numpy(np.array(label_array[i]).reshape(1,1))), requires_grad = False)
score = dual_encoder(context, response)
loss = torch.nn.functional.binary_cross_entropy_with_logits(score, label)
loss_acc += loss.data[0]
loss.backward()
scheduler.step() #?
optimizer.zero_grad()
print("Epoch: ", epoch, ", Loss: ", (loss_acc/len(label_array)))
If I do it like this, according to one PyTorch example, the parameters do not get updated. If I use optimizer.step() instead, the scheduler will not even be applied, as far as I understand.
I would be happy about some code example!
Thanks!
2 Likes
You should add optimizer.step()
into your training loop and move scheduler.step()
into the epoch loop.
9 Likes
g1910
(Gaurav Mittal)
March 6, 2018, 10:08am
4
What if I want to have the learning schedule defined on number of iterations instead of number of epochs? Can I still put by schedule.step() inside the inner loop?
Sure, you could guard it with a condition.
E.g. if you would like to call it every 100 iterations:
for epoch in range(epochs):
for batch_idx, (data, target) in enumerate(data_loader):
# Your training routine
data = ...
if (batch_idx+1) % 100 == 0:
scheduler.step()
8 Likes
Thanks! Here’s some more code…
lstm = DSARNN(input_dims, sequence_length, cell_size)
criterion = torch.nn.MSELoss()
#optimiser = torch.optim.Adagrad(lstm.parameters(), lr=0.01)
optimiser = torch.optim.SGD(lstm.parameters(), lr=0.1)
scheduler = torch.optim.lr_scheduler.StepLR(optimiser, step_size=3, gamma=0.1)
# register hooks
lstm.softmax.register_forward_hook(monitorAttention)
# init Tensorboard
tensorboard_step = 0
writer = SummaryWriter(comment="LSTM Cell + input attention, entropy " + str(bc.entropy()).replace('.', '_'))
for epoch in range(40):
scheduler.step(epoch)
# train
for minibatch, target in tqdm(Batches(train, target_input), total=len(train)):
optimiser.zero_grad()
output = lstm(minibatch)
loss = criterion(output, target)
loss.backward()
tensorboard_step += 1
writer.add_scalar('training loss', loss, tensorboard_step)
writer.add_scalar('learning rate', get_learning_rate(optimiser), tensorboard_step)
loss = optimiser.step()
1 Like
scheduler.step() should be called after optimiser.step()
2 Likes
Welcome. Best of luck with your project.
1 Like
Liang
(Liang)
March 12, 2020, 10:37pm
10
DuaneNielsen:
step_size
Use scheduler.step()
instead of scheduler.step(epoch)
. It has strange behavious when using MultiStepLR
. Though it works fine for StepLR
in your example.
Ref: https://discuss.pytorch.org/t/whats-the-difference-between-scheduler-step-and-scheduler-step-epoch/73054
1 Like
Is there a cleaner way for doing this for ExponentialLR than the below code:
args.learning_rate = 0.001 i.e. start_lr
args.learning_rate_decay_factor = 0.96
args.learning_rate_decay_step = 3000
optim = torch.optim.Adam(params=model.parameters(), lr=args.learning_rate)
lr_scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer=optim, gamma=args.learning_rate_decay_factor,last_epoch=-1)
for epoch in (range(1, args.num_epoch + 1)):
# Do Forward pass here
optim.zero_grad()
loss.backward()
optim.step()
if epoch % args.learning_rate_decay_step == 0:
lr_scheduler.step()
# debugging purpose
print(lr_scheduler.get_last_lr()) # will print last learning rate.
The idea is to take care of decay_step in Scheduler itself, rather than adding an if condition during training.
Like in Tensorflow
tf.compat.v1.train.exponential_decay( learning_rate, global_step, decay_steps, decay_rate, staircase=False, name=None)
Ref: tf.compat.v1.train.exponential_decay | TensorFlow v2.9.1