Problem with ReduceLROnPlateau

SKYHOWIE25 · November 28, 2017, 5:09am

Hi

I tried to use torch.optim.lr_scheduler.ReduceLROnPlateau to manipulate the learning rate. I followed the example in the doc:

Example:
            >>> optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
            >>> scheduler = torch.optim.ReduceLROnPlateau(optimizer, 'min')
            >>> for epoch in range(10):
            >>>     train(...)
            >>>     val_loss = validate(...)
            >>>     # Note that step should be called after validate()
            >>>     scheduler.step(val_loss)

However, I got an error as:

Traceback (most recent call last):
  File "train.py", line 676, in <module>
    main()
  File "train.py", line 233, in main
    scheduler.step(val_loss)
  File "/usr/local/lib/python2.7/dist-packages/torch/optim/lr_scheduler.py", line 258, in step
    if self.is_better(current, self.best):
  File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 123, in __bool__
    torch.typename(self.data) + " is ambiguous")
RuntimeError: bool value of Variable objects containing non-empty torch.cuda.ByteTensor is ambiguous

Can someone please tell me what is going on here and what should i do?

Many thanks

kyungyunlee · November 29, 2017, 7:01am

I had a similar problem with this scheduler and my problem was caused by using a loss value from the training output and not the validation output.
I think you need to show us your code on how you implemented the ‘Example’

SKYHOWIE25 · November 29, 2017, 11:56pm

Hi, Thank you for the reply.

Here is my code:

optimizer = torch.optim.SGD(model.parameters(), args.lr,
                                    momentum=args.momentum,
                                    weight_decay=args.weight_decay)
scheduler = ReduceLROnPlateau(optimizer, 'min')

for epoch in xrange(args.start_epoch, args.epochs):
       train(train_loader, model, criterion, optimizer, epoch)
       result_avg, loss_val = validate(val_loader, model, criterion, epoch)
       scheduler.step(loss_val)

and here is how I got the validation loss:

out_cls = model(input_var)

loss_cls = [None] * 40
for i in range(len(loss_cls)):
    loss_cls[i] = criterion(out_cls[i], target_cls[i])

loss_val = 0
for i in range(len(loss_cls)):
    loss_val += loss_cls[i]

What the difference between training loss and validation loss? I mean they have the same data type right?

kyungyunlee · November 30, 2017, 1:27am

Hmm…I am very new to pytorch so I am still experimenting as well, so I am not 100% sure and correct me please if I am wrong

But first, the output of criterion/loss function is autograd Variable, since it can be used like

loss = criterion(…)
loss.backward()

And looking at the scheduler class, the scheduler.step() function seems to be accepting a number(like a float), since it looks like it is doing a comparison with some threshold values.

So, when you are adding loss_cls[i], try loss_cls[i].data[0] instead, because variable.data is a tensor and variable.grad is another variable for computing gradients.

You might want to try printing out the output of criterion and check out what type it is and what it is composed of, etc…

Also, maybe you should average out the loss value?

stefanonardo · November 30, 2017, 1:36am

You just need to convert the loss into a number.

SKYHOWIE25 · December 1, 2017, 3:14am

Thanks! use.data[0] works.