My model can finish its training phase, while validation phase will throw an exception: Runtime Error: CUDA out of memory.
After using ‘with torch.no_grad()’, this model can works well, but I wonder why it’ll cause cuda out of memory without ‘with torch.no_grad()’ and what does ‘with torch.no_grad()’ change.
My function is defined as follows:
for i in range(1,epochs + 1):
train_loss,valid_loss = 0.0,0.0
for batch,(data,labels) in enumerate(trainloader):
model.train()
if is_cuda:
data,labels = data.cuda(),labels.cuda()
outs = model(data)
loss = criterion(outs,labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
train_loss = train_loss + loss * data.size(0)
with torch.no_grad():
for batch,(data,labels) in enumerate(validloader):
model.eval()
if is_cuda:
data,labels = data.cuda(),labels.cuda()
outs = model(data)
loss = criterion(outs,labels)
valid_loss = valid_loss + loss * data.size(0)
train_loss = train_loss/len(trainloader.dataset)
valid_loss = valid_loss/len(validloader.dataset)
Using with torch.no_grad() disables gradient calculation. So, the reason why it uses less memory is that it’s not storing any Tensors that are needed to calculate gradients of your loss. Also, because you don’t store anything for the backward pass, the evaluation of your network is quicker (and use less memory).
In this project, my validation set is only one fifth of training set, so I think it’s weird that validation phase arises CUDA out of memory but training phase won’t instead.
What initially comes to my mind is that you’re still holding on to your training set data in memory somewhere, so when you go to use your validation set you increase the total memory (as you’re adding new data but haven’t cleared the training data that you’re no longer using)!
(Might be wrong, so best to get a dev’s opinion but I have a feeling it could be something like this!)
I have tried your idea, the validation phase will arise CUDA Error while removing ‘with no_grad()’.
Code as follows:
for i in range(1,epochs + 1):
valid_loss = 0.0
for batch,(data,labels) in enumerate(validloader):
model.eval()
if is_cuda:
data,labels = data.cuda(),labels.cuda()
outs = model(data)[1]
loss = criterion(outs,labels)
valid_loss = valid_loss + loss * data.size(0)
valid_loss = valid_loss/len(validloader.dataset)
history.log(i,valid_loss = valid_loss)
with canvas:
canvas.draw_plot([history['valid_loss']])