Torch parameters update in epoch not in loop

tseng · December 5, 2022, 6:33am

Hi there,

I was tried to train a network, and I don’t want the parameter update in each loop. I want them only update in epoch, how can I do that? Thank you.

Original

for epoch in range(10):
    for idx, (B, A) in enumerate(loop):
        ....
        opt.zero_grad()
        scaler.scale(loss).backward()
        scaler.step(opt)
        scaler.update()

and if I move the update to epoch nest, it’s shows that

RuntimeError: Unable to find a valid cuDNN algorithm to run convolution

Cheers,
Tsai

ptrblck · December 5, 2022, 7:03am

Could you describe how the gradients are computed and or the loss to update the parameters once per epoch or are you only using the loss calculated from the last batch (which would be quite wasteful)?
The error might be raised if you are running our of memory or if you are running into an unexpected error.
To debug the cuDNN error, could you post a minimal, executable code snippet reproducing the error as well as the output of python -m torch.utils.collect_env, please?

tseng · December 5, 2022, 11:22am

I was tried to train the GAN, and what I want to observe is the same input image how will it be different in epoch. But in fact parameters they being update during the loop, so all the image in loader will also update during the loop.

That is to say, in the same loop I want the images in loader they share the same parameters.
(I would like to know whether the parameter update only can exits in between epoch rather than loops.)
Thank you!

Here I try to post the minimal of them. (Original)

opt_disc = optim.Adam(list(disc_A.parameters()) + list(disc_B.parameters()),lr=config.LEARNING_RATE,
        betas=(0.5, 0.999),)
opt_gen = optim.Adam(list(gen_B.parameters()) + list(gen_A.parameters()), lr=config.LEARNING_RATE,
        betas=(0.5, 0.999))

L1 = nn.L1Loss()  
mse = nn.MSELoss()

for epoch in range(num_epochs):
    fake_A, fake_B = train_fn(disc_A, disc_B, gen_A, gen_B, loader, opt_disc, opt_gen, L1, mse, d_scaler, g_scaler, epoch)
    for idx, (B, A) in enumerate(loop):
        with torch.cuda.amp.autocast():
            '''in here i got D_loss'''
        opt_disc.zero_grad()
        d_scaler.scale(D_loss).backward()
        d_scaler.step(opt_disc)
        d_scaler.update()

        with torch.cuda.amp.autocast():
            '''in here i got G_loss'''
        opt_gen.zero_grad()
        g_scaler.scale(G_loss).backward()
        g_scaler.step(opt_gen)
        g_scaler.update()

ptrblck · December 5, 2022, 3:33pm

Sorry, but it’s still unclear how the actual loss(es) are calculated.
Are you trying to accumulate the losses from each mini-batch and then calculate the gradients based on the final loss or are you computing the gradients in each iteration, but are planning to call optimizer.step() only once per epoch?

I also don’t understand the actual use case:

tseng · December 6, 2022, 2:08am

I think what I try is " computing the gradients in each iteration, but are planning to call optimizer.step() only once per epoch".

Let me take an example for use case:

we got 50 images in dataloader, and I’m training a GAN network.
What I observe: when I train the GAN network, the first epoch of the first data I generate is blurred, and No.50 data is very clear. (batchsize=1) → Means it already update in the loop.
What I want to get: the first epoch of the first data I generate is blurred, and No.50 data is blurred, too. Only become more clear as the epoch increase.

# Here is how the G_loss come from, so does D_loss
loss_G_A = mse(D_A_fake, torch.ones_like(D_A_fake)) 
cycle_B_loss = L1(B, cycle_B)
G_loss = loss_G_A + cycle_B_loss

Thank you

ptrblck · December 6, 2022, 5:51am

In that case call backward() inside the DataLoader loop and optimizer.step() afterwards. Also note that each backward call will compute and accumulate the gradients, so you might consider scaling them (or dividing the actual loss by the number of batches) before calling optimizer.step().

tseng · December 6, 2022, 8:30am

Hi Ptrblck,

Thanks for helping, based on your reply I have some idea. And now I’m trying to base on your suggestion to find a suitable solution.
If any update, I’ll share in here~~Thanks

Cheers,
Tsai