With torch.no_grad()

RoitSky · August 24, 2021, 12:19pm

My model can finish its training phase, while validation phase will throw an exception: Runtime Error: CUDA out of memory.

After using ‘with torch.no_grad()’, this model can works well, but I wonder why it’ll cause cuda out of memory without ‘with torch.no_grad()’ and what does ‘with torch.no_grad()’ change.

My function is defined as follows:

for i in range(1,epochs + 1):
    train_loss,valid_loss = 0.0,0.0

    for batch,(data,labels) in enumerate(trainloader):
        model.train()
        if is_cuda:
            data,labels = data.cuda(),labels.cuda()
        outs = model(data)
        loss = criterion(outs,labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        train_loss = train_loss + loss * data.size(0)

    with torch.no_grad():
        for batch,(data,labels) in enumerate(validloader):
            model.eval()
            if is_cuda:
                data,labels = data.cuda(),labels.cuda()
        
            outs = model(data)
            loss = criterion(outs,labels)
            valid_loss = valid_loss + loss * data.size(0)

    train_loss = train_loss/len(trainloader.dataset)
    valid_loss = valid_loss/len(validloader.dataset)

AlphaBetaGamma96 · August 24, 2021, 12:49pm

Hi,

Using with torch.no_grad() disables gradient calculation. So, the reason why it uses less memory is that it’s not storing any Tensors that are needed to calculate gradients of your loss. Also, because you don’t store anything for the backward pass, the evaluation of your network is quicker (and use less memory).

You can read more about it in the docs! no_grad — PyTorch 1.9.0 documentation

RoitSky · August 24, 2021, 2:17pm

Thx, I got you. But I have another question.

In this project, my validation set is only one fifth of training set, so I think it’s weird that validation phase arises CUDA out of memory but training phase won’t instead.

AlphaBetaGamma96 · August 24, 2021, 2:23pm

What initially comes to my mind is that you’re still holding on to your training set data in memory somewhere, so when you go to use your validation set you increase the total memory (as you’re adding new data but haven’t cleared the training data that you’re no longer using)!

(Might be wrong, so best to get a dev’s opinion but I have a feeling it could be something like this!)

RoitSky · August 25, 2021, 1:56am

I have tried your idea, the validation phase will arise CUDA Error while removing ‘with no_grad()’.

Code as follows:

for i in range(1,epochs + 1):
    valid_loss = 0.0
    for batch,(data,labels) in enumerate(validloader):
        model.eval()
        if is_cuda:
            data,labels = data.cuda(),labels.cuda()
        
        outs = model(data)[1]
        loss = criterion(outs,labels)
        valid_loss = valid_loss + loss * data.size(0)

    valid_loss = valid_loss/len(validloader.dataset)

    history.log(i,valid_loss = valid_loss)
    with canvas:
        canvas.draw_plot([history['valid_loss']])

AlphaBetaGamma96 · August 25, 2021, 9:04am

And, what is the error?