UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()'

VahidZarghami · July 7, 2020, 9:57pm

Hi I got this error can anyone help me please?

UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate

criterion = nn.CrossEntropyLoss()
criterion = criterion.cuda()
optimizer = optim.SGD(model.parameters(), lr=0.0001, momentum=0.9)
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=7, gamma=0.1)


model = train_model(model, train_dl, valid_dl, criterion, optimizer,scheduler, num_epochs=100)


def train_model(model, train_dl, valid_dl, criterion, optimizer,
                scheduler, num_epochs=10):

    if not os.path.exists('models'):
        os.mkdir('models')
    
    since = time.time()
       
    best_model_wts = model.state_dict()
    best_acc = 0.0

    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch + 1, num_epochs))
        print('-' * 10)

        ## train and validate
        model = train_one_epoch(model, train_dl, criterion, optimizer, scheduler)
        val_acc = validate_model(model, valid_dl, criterion)
        
        # deep copy the model
        if val_acc > best_acc:
            best_acc = val_acc
            best_model_wts = model.state_dict().copy()
            torch.save(best_model_wts, "./models/epoch-{}-acc-{:.5f}.pth".format(epoch, best_acc))

        print()

    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:.4f}'.format(best_acc))

    # load best model weights
    model.load_state_dict(best_model_wts)
    return model


def train_one_epoch(model, dataloder, criterion, optimizer, scheduler):
    if scheduler is not None:
        scheduler.step()
    
    model.train(True)
    
    steps = len(dataloder.dataset) // dataloder.batch_size
    
    running_loss = 0.0
    running_corrects = 0
    
    for i, (inputs, labels) in enumerate(dataloder):
        inputs, labels = to_var(inputs), to_var(labels)
        
        optimizer.zero_grad()
        
        # forward
        outputs = model(inputs)
        _, preds = torch.max(outputs.data, 1)
        loss = criterion(outputs, labels)
        
        # backward
        loss.backward()
        
        # update parameters
        optimizer.step()
        
        # statistics
        running_loss  = (running_loss * i + loss.item()) / (i + 1)
        running_corrects += torch.sum(preds == labels).item()
        
        # report
        sys.stdout.flush()
        sys.stdout.write("\r  Step %d/%d | Loss: %.5f" % (i, steps, loss.item()))
        
    epoch_loss = running_loss
    epoch_acc = running_corrects / len(dataloder.dataset)
    
    sys.stdout.flush()
    print('\r{} Loss: {:.5f} Acc: {:.5f}'.format('  train', epoch_loss, epoch_acc))
    
    return model


Thank you

ptrblck · July 9, 2020, 2:10am

As the warning explains, you should call sdcheduler.step() after optimizer.step() was called (starting with PyTorch >= 1.1.0).
In your current code you are calling scheduler.step() directly in the first lines of the train_one_epoch method. Move it after the optimizer.step() method and the warning should be gone.

VahidZarghami · July 9, 2020, 10:48pm

Thank you very much.

localh · October 6, 2020, 3:53pm

I am getting the same warning and I am wondering if it is because of GradScaler and autocast?

Here is my function, following https://pytorch.org/docs/stable/notes/amp_examples.html#amp-examples, that is causing the warning:

# train function
def train(dataloader):
    pbar = ProgressBar(n_total=len(dataloader), desc='Training')
    train_loss = AverageMeter()
    model.train()
    for batch_idx, batch in enumerate(dataloader):
        b_features, b_target, b_idx = batch['features'].to(DEVICE),  batch['target'].to(DEVICE), batch['idx'].to(DEVICE)
        optimizer.zero_grad()
        with autocast():
            logits, probs = model(b_features)
            loss = F.cross_entropy(logits, b_target)
        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scheduler.step()
        scaler.update()
        pbar(step=batch_idx, info={'loss': loss.item()})
        train_loss.update(loss.item(), n=1)
    return {'loss': train_loss.avg}

UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)

ptrblck · October 6, 2020, 10:03pm

Yes, if the scaler detects invalid gradients, the optimizer.step() call is skipped and thus the learning rate scheduler is called before the first parameter update.
We are working on a method to get the last status of the scaler. In the meantime you could try to check if the scale value was reduced and if so skip the learning rate scheduler step.

localh · October 6, 2020, 10:34pm

Thanks so much; I appreciate the feedback!

tcarvalho · December 17, 2020, 3:46pm

Hey,

I just want to confirm an hypothesis. I had a similar problem when using autocast and gradscaler. However, the problem wasn’t happening when I was in normal precision (FP32). The problem seemed to happen right at the first iteration of training and I was using a pre-trained model.

In order to find the problem I used detect_anomaly and the origin of the problem seemed to happen right at the beginning of the backbrop (I’m using the cross-entropy loss). I also looked at the input of the loss and at the weight of the model and everything seemed fine. However, I notice that in normal precision some of the weight had a really high gradient.

I figured the problem was maybe caused by the gradscaler function that scales some of the gradients to high and they became inf. The problem seemed to be fixed by lowering the init_scale of gradscaler. I’m currently using a value of 2^14 instead of 2^16 and everything seems to work fine now.

Can you confirm that this hypothesis holds its ground and that I’m not missing something?

(I also wanted to put this somewhere since I didn’t found anything on this issue anywhere)

Thanks!

ptrblck · December 17, 2020, 7:56pm

Yes, your explanation is correct. The default scale factor is set to 2**16 and could yield invalid gradients in the first iteration(s), which is expected and usually not a problem. Lowering this value might solve the decrease in the scaling factor, but would be model dependent.

wiiiktor · August 28, 2021, 1:17am

Can I mute this UserWarning? I have the lr_scheduler.step() put at the very end of my script, Pytorch version is 1.7.0, but I get this warning anyway

ptrblck · August 28, 2021, 8:20am

I’m not sure how to filter out this exact warning, but since it should be raised only once I hope it’s not making too much noise in your log.
In any case, since the scheduler.step() is apparently applied at the end of the script, I would recommend to check why this warning is raised at all.

BryanZhou · September 9, 2021, 6:34am

Hi, @ptrblck , is the method ready in Pytorch release?

McNugget1130 · March 23, 2022, 12:05pm

I met the same issue. I have put “scheduler.step()” at the end of “for epoch in range()”, but the warning still popped up.

kassy11 · January 28, 2023, 4:19pm

I came across the same Warning. Does this affect the learning results?

JayceWong · June 20, 2024, 9:55am

I have the same concern. Can anyone give an answer?

pdc_87 · November 28, 2024, 12:45am

Hi @ptrblck Did anyone solve this regarding the scaler?

        # Backpropagate
        self.scaler.scale(loss).backward()
        # nn.utils.clip_grad_norm_(self.model.parameters(), max_norm=1.0)
        self.scaler.step(self.optimizer)
        self.scaler.update()
        self.lr_scheduler.step()

Face the same here.