Got the following error while training my model
> 95 scheduler.step()
.
.
.
ValueError: Tried to step 1402 times. The specified number of total steps is 1400
But I’ve NOT specified anywhere in my code that the total steps should be 1400
please help
You need to provide more information. Maybe share the snippet of your minimal code, the line where the error stack originates, and so on.
However, I remember this sort of error popping up while using pytorchlightning
. There this error occurs when you are using a scheduler
in the optimizer
without passing the max_epochs
argument.
1 Like
Here’s the code snipet:
criterion = nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(model.parameters(),lr=0.001)
scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=0.001,
steps_per_epoch=int(len(train_dl)),
epochs=num_epochs,
anneal_strategy='linear')
You are defining the number of steps via steps_per_epoch
and epochs
as described in the docs:

epochs (int) – The number of epochs to train for. This is used along with steps_per_epoch in order to infer the total number of steps in the cycle if a value for total_steps is not provided. Default: None

steps_per_epoch (int) – The number of steps per epoch to train for. This is used along with epochs in order to infer the total number of steps in the cycle if a value for total_steps is not provided. Default: None
and as seen here:
model = nn.Linear(1, 1)
criterion = nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(model.parameters(),lr=0.001)
num_epochs = 10
steps_per_epoch = 2
scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=0.001,
steps_per_epoch=steps_per_epoch,
epochs=num_epochs,
anneal_strategy='linear')
for epoch in range(num_epochs):
for _ in range(steps_per_epoch):
optimizer.step()
scheduler.step()
optimizer.step()
scheduler.step()
# ValueError: Tried to step 22 times. The specified number of total steps is 20