What is the correct way to use mixed-precision training with OneCycleLR

I am using the onecyclelr along with mixed precision training

model = SwinNet().to(params['device'])
criterion = nn.BCEWithLogitsLoss().to(params['device'])
optimizer = torch.optim.AdamW(model.parameters(), lr=params['lr'])
scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, 

def train(dataloader, model, criterion, optimizer, epoch, params):
    scaler = torch.cuda.amp.GradScaler() # enable mixed precision training
    stream = tqdm(dataloader)
    train_loss = 0
    for i, (images, target) in enumerate(stream, start=1):

        images = images.to(params['device'], non_blocking=True)
        target = target.to(params['device'], non_blocking=True).float().view(-1, 1)
        images, targets_a, targets_b, lam = mixup_data(images, target.view(-1, 1))
        with torch.cuda.amp.autocast(): # wrapper for mixed precision training
            output = model(images).to(params['device'])
            loss = mixup_criterion(criterion, output, targets_a, targets_b, lam)
        train_loss += loss
    train_loss /= len(dataloader)
    return train_loss

I am getting the following error:

What is the correct way to implement mixed precision along with onecyclelr ?

The step method of the GradScaler expects an optimizer object, not a scheduler, so remove this line of code.

1 Like

Thanks for the response, it fixed the error!

I do have a question though -
The PyTorch doc for OneCycleLR mentions that the scheduler should be called after every batch. However, gradscaler is expecting an optimizer object. Is there a way where the lr schedule & gradscaler can be used together?

You can call the scheduler.step() after scaler.step(optimizer) to adapt the learning rate.

1 Like