`scheduler` gives me worse results

I tried to add a scheduler to my network for better convergence but I get worse results

without any schedulers:

with schedulers on the same network:

all code: Denoising Autoencoders (DAE) in PT & PT⚡ | Kaggle

what is my mistake?

Are you calling scheduler.step()? (In case there is a learning rate warmup and the learning rate is never changed from the initial value)

yes, I do as I see in the documentation :

# tqdm progress bar
pbar = tqdm(total=epochs * len(dataloader), file=sys.stdout, colour='green')

for epoch in range(1, epochs+1):
    # monitor training loss
    train_loss = 0.0
    # Loops over our dataset in the batches the data loader creates for us
    for data in dataloader:
        # Get the data
        images, labels = data
        # flatten images, keeps the three channels and merges all the remaining dimensions into one,
        # figuring out the appropriate size.
        images = images.view(images.size(0), -1)
        # add noise to the images
        noisy_images = images + 0.5 * torch.randn(*images.shape) # 0.5: noise_factor
        noisy_images = np.clip(noisy_images, 0., 1.)
        # place all the tensors on the same device
        images, noisy_images = images.to(device), noisy_images.to(device)
        # forward pass, Feeds a batch through our model
        outputs = model(noisy_images)
        # calculate the loss - measure how far (wrong) the noisy from the original image
        loss = criterion(outputs, images)
        # clear the gradients of all optimized variables from the last round
        # backward pass, Propagate the loss signal backward
        # perform a single optimization step (update model parameters)
        # Step the learning rate scheduler
        # update running training loss
        # transform the loss to a Python number with .item(), to escape the gradients.
        train_loss += loss.item() * images.size(0)
        # update tqdm progress bar
    train_loss = train_loss / len(dataloader)
    print(f'Epoch: {epoch} \tTraining Loss: {train_loss:0.6f}')
# close tqdm progress bar

Could you share how you are creating the scheduler (e.g., how the optimizer is being passed, etc.) and check if the state dict and learning rate of the optimizer (e.g., via .state_dict() and .get_lr() ) makes sense?

that’s how i instantiate the optimizer & scheduler:

lr = 1e-3
optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=1e-5)  # weight decay: L2 penalty
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=5)

I found the problem: moving the line scheduler.step(loss) outside the inner loop after calculating train_loss. and passing in train_loss instead of loss to the scheduler’s step method.

My mistake :man_facepalming:

1 Like