Hi
I tried to use torch.optim.lr_scheduler.ReduceLROnPlateau
to manipulate the learning rate. I followed the example in the doc:
Example:
>>> optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
>>> scheduler = torch.optim.ReduceLROnPlateau(optimizer, 'min')
>>> for epoch in range(10):
>>> train(...)
>>> val_loss = validate(...)
>>> # Note that step should be called after validate()
>>> scheduler.step(val_loss)
However, I got an error as:
Traceback (most recent call last):
File "train.py", line 676, in <module>
main()
File "train.py", line 233, in main
scheduler.step(val_loss)
File "/usr/local/lib/python2.7/dist-packages/torch/optim/lr_scheduler.py", line 258, in step
if self.is_better(current, self.best):
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 123, in __bool__
torch.typename(self.data) + " is ambiguous")
RuntimeError: bool value of Variable objects containing non-empty torch.cuda.ByteTensor is ambiguous
Can someone please tell me what is going on here and what should i do?
Many thanks
I had a similar problem with this scheduler and my problem was caused by using a loss value from the training output and not the validation output.
I think you need to show us your code on how you implemented the ‘Example’
Hi, Thank you for the reply.
Here is my code:
optimizer = torch.optim.SGD(model.parameters(), args.lr,
momentum=args.momentum,
weight_decay=args.weight_decay)
scheduler = ReduceLROnPlateau(optimizer, 'min')
for epoch in xrange(args.start_epoch, args.epochs):
train(train_loader, model, criterion, optimizer, epoch)
result_avg, loss_val = validate(val_loader, model, criterion, epoch)
scheduler.step(loss_val)
and here is how I got the validation loss:
out_cls = model(input_var)
loss_cls = [None] * 40
for i in range(len(loss_cls)):
loss_cls[i] = criterion(out_cls[i], target_cls[i])
loss_val = 0
for i in range(len(loss_cls)):
loss_val += loss_cls[i]
What the difference between training loss and validation loss? I mean they have the same data type right?
Hmm…I am very new to pytorch so I am still experimenting as well, so I am not 100% sure and correct me please if I am wrong
But first, the output of criterion/loss function is autograd Variable, since it can be used like
loss = criterion(…)
loss.backward()
And looking at the scheduler class, the scheduler.step() function seems to be accepting a number(like a float), since it looks like it is doing a comparison with some threshold values.
So, when you are adding loss_cls[i], try loss_cls[i].data[0] instead, because variable.data is a tensor and variable.grad is another variable for computing gradients.
You might want to try printing out the output of criterion and check out what type it is and what it is composed of, etc…
Also, maybe you should average out the loss value?
1 Like
You just need to convert the loss into a number.
2 Likes
Thanks! use.data[0]
works.