A simplified version of my training pipe line is like this:
import torch.cuda.amp as amp scaler = amp.Scaler() optim.zero_grad() with amp.autocast(enabled=True): logits1, logits2, logits3 = model(imgs) loss1 = criteria(logits1, labels) loss_aux = [criteria(logits2, labels), criteria(logits3, labels)] loss = loss1 + sum(loss_aux) scaler.scale(loss).backward() scaler.step(optim) scaler.update()
While the apex version is like this:
from apex import amp optim.zero_grad() logits1, logits2, logits3 = model(imgs) loss1 = criteria(logits1, labels) loss_aux = [criteria(logits2, labels), criteria(logits3, labels)] loss = loss1 + sum(loss_aux) with amp.scale_loss(loss, optim) as scaled_loss: scaled_loss.backward() optim.step()
I am using pytorch 1.6.0 with python3.6.9 on ubuntu 16.04 (docker container) with cuda 10.1.243/cudnn7.
There are two things that I feel puzzled:
The first one is that I received a warning like this:
/miniconda/envs/py36/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:123: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
though I am sure I called the
lr_scheduler after the
scaler.step(optim), and when I set
autocast(enabled=False), there will not be this warning.
The second is that the pytorch native version is much slower than the apex version. When I use
autocast of pytorch, the training time for 100 iter is 110s, and when I use
apex, the training time is around 80s.
Is there any problem with my usage of this new feature and how could I solve the problem ?