I tried using volatile=True param on the variables and it didn’t help. I am using the same batch size. I am not doing anything special to use cuDNN. I am using the default setting.
def validate(self, dev_corpus):
# Turn on evaluation mode which disables dropout.
self.model.eval()
dev_batches = helper.batchify(dev_corpus.data, self.config.batch_size)
print('number of dev batches = ', len(dev_batches))
dev_loss = 0
num_batches = len(dev_batches)
for batch_no in range(1, num_batches + 1):
session_queries, session_query_length, rel_docs, rel_docs_length, doc_labels = helper.session_to_tensor(
dev_batches[batch_no - 1], self.dictionary)
if self.config.cuda:
session_queries = session_queries.cuda()
session_query_length = session_query_length.cuda()
rel_docs = rel_docs.cuda()
rel_docs_length = rel_docs_length.cuda()
doc_labels = doc_labels.cuda()
loss = self.model(session_queries, session_query_length, rel_docs, rel_docs_length, doc_labels)
if loss.size(0) > 1:
loss = loss.mean()
dev_loss += loss.data[0]
return dev_loss / num_batches
I am using the above function for evaluation. Here, session_queries, session_query_length, … rest variables are created by enabling volatile=True.
I am a novice. Have the same problem but during the inference, I have never met ‘out of memory’ error without using the torch.no_grad() or volatile=True before. But at this time it seems not to work without using torch.no_grad(). pytorch 3.0.0.
You might run out of memory if you still hold references to some tensors from your training iteration.
Since Python uses function scoping, these variables are still kept alive, which might result in your OOM issue. To avoid this, you could wrap your training and validation code in separate functions. Have a look at this post for more information.
The computation graph will be created and intermediate tensors are stored.
If you don’t call backward (which wouldn’t even be possible in a torch.no_grad() block), nothing else will change.
Well you would call backward in the training portion. So would you then update the net with the grads tracked in the validation portion as well as those in the training portion? Assuming that torch.no_grad() was forgotten in validation.
During training a new computation graph would usually be created, as long as you don’t pass e.g. the output of your validation phase as the new input to the model during training.
model = models.resnet18(pretrained=True)
# Pseudo validation phase
x1 = torch.randn(1, 3, 224, 224)
out = model(x1)
# Pseudo training phase
x1 = torch.ones(1, 3, 224, 224)
out = model(x1)
out.mean().backward()
In this code snippet you have “forgotten” to use torch.no_grad() during the validation phase.
However, since out is not used, it won’t have any effect on the gradients, but will just use unnecessary memory.
crit = nn.SomeLoss()
optim = optim.SGD()
net = models.resnet18()
for e in range(num_epochs):
# training
pred = net(some_data)
optim.zero_grad()
loss = crit(pred, target)
loss.backward()
optim.step()
# validation
valid_pred = net(some_validation_data)
loss = crit(valid_pred, valid_target)