UnboundLocalError: local variable 'values' referenced before assignment in lr_scheduler

This my code:

optimizer=torch.optim.AdamW(model.parameters(),lr=0.005,weight_decay=0.01)
scheduler=torch.optim.lr_scheduler.OneCycleLR(optimizer,max_lr=0.0005,total_steps=total_steps,epochs=30)

when I use scheduler.step(), this error appear:

how to solve this error?

Could you post a code snippet to reproduce this issue, please?
This dummy example runs fine:

model = models.resnet18()
optimizer=torch.optim.AdamW(model.parameters(),lr=0.005,weight_decay=0.01)
scheduler=torch.optim.lr_scheduler.OneCycleLR(optimizer,max_lr=0.0005,total_steps=10,epochs=30)

output = model(torch.randn(1, 3, 224, 224))
loss = F.cross_entropy(output, torch.randint(0, 1000, (1,)))
loss.backward()
optimizer.step()
scheduler.step()
criterion=smp.utils.losses.DiceLoss()
for epoch in range(epochs):
        model.train()
        epoch_loss=0
        for i,(image,mask) in enumerate(train_dl):
                 optimizer.zero_grad()
                 image = image.to(device).float()
                 output=model(image)
                 output=output.to('cpu')
                 output=torch.sigmoid(output)
                 loss=criterion(output,mask)
                 epoch_loss+=loss
                 loss.backward()
                 optimizer.step()
                 scheduler.step()

Need any other snippet code? I’m not sure how much complete code you need

A minimal and executable code snippet would be great.
Could you try to remove unnecessary functions and use some random inputs, so that we can reproduce this issue locally?

criterion=smp.utils.losses.DiceLoss()
for epoch in range(epochs):
        model.train()
        epoch_loss=0
        for i,(image,mask) in enumerate(train_dl):
                 data=torch.randn(8,3,128,128)
                 target=torch.randn(8,4,128,128)
                 optimizer.zero_grad()
                 data=data.to(device).float()
                 output=model(data)
                 output=torch.sigmoid(output)
                 output=output.to('cpu')
                 loss=criterion(output,target)
                 epoch_loss += loss
                 loss.backward()
                 optimizer.step()
                 scheduler.step()
           
        epoch_loss = epoch_loss / total_steps
        print('epoch:{},epoch_loss:{}'.format(epoch,epoch_loss))

I’m sorry for my slow response。This code might suit your needs。

This is the error:

You can see that the first call to scheduler. Step should have been fault-free because it printed the following print statement, as well as eval for the model. But an error should have occurred on the second call

I forgot the code that came out and this is the following code

 model.eval()
 metric,metric2,valid_loss=evalue(model,valid_dl)
 if metric2>best_score:
            state={'state':model.state_dict(),'best_score':metric2}
            torch.save(state,checkpoint_path)
            best_score=metric2
logging.basicConfig(filename='cloud4.log', level=logging.DEBUG, format='%(asctime)s-%(message)s')
logging.warning('epoch_loss:{},metric1:{},metric2:{}'.format(epoch_loss,metric,metric2))

Now it turns out that if you use annotated code, you get an error, while unannotated code works fine

  # for image,mask in train_dl:
        for i in range(5):
                 data=torch.randn(8,3,128,128)
                 target=torch.randn(8,4,128,128)
                 optimizer.zero_grad()
                 data=data.to(device).float()
                 output=model(data)
                 output=torch.sigmoid(output)
                 output=output.to('cpu')
                 loss=criterion(output,target)
                 epoch_loss += loss
                 loss.backward()
                 optimizer.step()
                 scheduler.step()
                 # image = image.to(device).float()
                 # optimizer.zero_grad()
                 # pre_mask=model(image)
                 # pre_mask=pre_mask.to('cpu')
                 # pre_mask=torch.sigmoid(pre_mask)
                 # loss=criterion(pre_mask,mask)
                 # epoch_loss+=loss
                 # loss.backward()
                 # optimizer.step()
                 # scheduler.step()

I really don’t know why, is it the data problem that caused the error

Thanks for the code so far.
Could you also post the code you are using to initialize the model, optimizer, and scheduler?
Also, could you try to run the code on the CPU only and check, if you see the same error?
If not, could you rerun the GPU code using CUDA_LAUNCH_BLOCKING=1 python script.py args and post the stack trace again?

This is my code:

model=smp.Unet('efficientnet-b3',encoder_weights=None,classes=4)
# # #
for m in model.modules():
    weights_init_kaiming(m)
# checkpoint=torch.load(checkpoint_path)
# model.load_state_dict(checkpoint['state'])
device=torch.device('cuda:0')
model.to(device)

total_steps=len(train_dl)
optimizer=torch.optim.AdamW(model.parameters(),lr=0.05,weight_decay=0.01)
scheduler=torch.optim.lr_scheduler.OneCycleLR(optimizer,max_lr=0.0005,total_steps=total_steps,epochs=30)

It was very embarrassing that I could not debug with CPU because of the machine, but if I used GPU to debug, the error result did not change

Could you call scheudler.get_lr() before the error is thrown and check the return value, please?

I am very sorry for replying to you a few days later, because I am a sophomore student in university and I have a lot of things to do recently, so I didn’t deal with this problem for a few days. As you requested, I added this line of code. The problem is that it returned the value successfully without any problems during the first epoch. But after the second epoch, it reported an error

Are you recreating or manipulating the scheduler or optimizer in each epoch somehow?

29/5000
Thank you for answering my question so patiently. My code should not have this problem。
If I use following code,the error will not appear( it will appear when using the annotating code):

for epoch in epochs:
     for batch in train_loader:
          #scheduler.step()
     scheduler.step()

Here is my complete training code(When you put scheduler. Step () into each batch iteration,error will apear):

for epoch in range(epochs):
        model.train()
        epoch_loss=0
        epoch_mask_loss=0
        epoch_label_loss=0
        for image,mask,label in train_dl:
            optimizer.zero_grad()
            r=np.random.rand(1)

            #cutmix transform
            if r>threshold:
                lam=np.random.beta(50,50)
                image,mask,cutmix_label=make_cutmix(image,mask,lam)
                image=image.to(device).float()
                mask_prediction,label_prediction=model(image)
                label_prediction=label_prediction.to('cpu')
                label_loss=lam*label_criterion(label_prediction,label)+(1-lam)*label_criterion(label_prediction,cutmix_label)
            else:
                image = image.to(device).float()
                mask_prediction,label_prediction=model(image)
                label_prediction=label_prediction.to('cpu')
                label_loss=label_criterion(label_prediction,label)

            mask_prediction=torch.sigmoid(mask_prediction)
            mask_prediction = mask_prediction.to('cpu')
            mask_loss=mask_criterion(mask_prediction,mask)
            epoch_mask_loss+=mask_loss
            epoch_label_loss+=label_loss
            loss=label_loss+mask_loss
            epoch_loss+=loss
            loss.backward()
            optimizer.step()



        epoch_loss = epoch_loss / total_steps
        epoch_label_loss=epoch_label_loss/total_steps
        epoch_mask_loss=epoch_mask_loss/total_steps
        print('epoch:{},epoch_loss:{},epoch_label_loss:{},epoch_mask_loss:{}'.format(epoch,epoch_loss,epoch_label_loss,epoch_mask_loss))
        model.eval()
        metric,metric2,valid_loss=evalue(model,valid_dl)
        if metric2>best_score:
            state={'state':model.state_dict(),'best_score':metric2}
            torch.save(state,checkpoint_path)
            best_score=metric2

        logging.warning('epoch_loss:{},metric1:{},metric2:{}'.format(epoch_loss,metric,metric2))
        scheduler.step()

Hi,

I had the same error that I think has been fixed.

In the end it seems like the number of epochs you had mentioned in your scheduler was less than the number of epochs you tried training for.
I went into %debug in the notebook and tried calling self.get_lr() as suggested.
I got this message:
*** ValueError: Tried to step 3752 times. The specified number of total steps is 3750

Then with some basic math and a lot of code search I realised that I had specified 5 epochs in my scheduler but called for 10 epochs in my fit function.

Hope this helps.

3 Likes

There is an error with total_steps.i am also getting same error but i rectified it

1 Like

Related to this issue.


If get_lr() throws an error, pytorch suppresses it but will later encounter this “values” unbounded bug. Fix the get_lr() error and this bug will go away.

Python has lexical scoping by default, which means that although an enclosed scope can access values in its enclosing scope, it cannot modify them (unless they’re declared global with the global keyword). A closure binds values in the enclosing environment to names in the local environment. The local environment can then use the bound value, and even reassign that name to something else, but it can’t modify the binding in the enclosing environment. UnboundLocalError happend because when python sees an assignment inside a function then it considers that variable as local variable and will not fetch its value from enclosing or global scope when we execute the function. To modify a global variable inside a function, you must use the global keyword.

1 Like

Hello,I had the same error that I can’t solve it.
I want to ask you some question about it.Thank you.
What is the ‘The specified number of total steps is 3750’?
How to change the number of steps?
Thank you.

Hi make sure that your dataloader and the scheduler have the same number of iterations. If I remember correctly I got this error when using the OneCycle LR scheduler which needs you to specify the max number of steps as init parameter. Hope this helps! If this isn’t the error you have, then please provide code and try to see what your scheduler.get_lr() method returns.