Out of Memory During Evaluation

shenkev · November 2, 2017, 3:40am

I’m getting very strange behavior during evaluation when I call foward in a for-loop. I either error with CUDA out-of-memory or get some weird results back from the evaluation function when I know is wrong (it seems like the correct variable gets resets or something). I think it’s related to this post

but I couldn’t tell from the comments how to free up the graph. Specifically, someone says:

This is because pytorch will build a the graph again and again, and all the intermediate states will be stored.
In training, the states will be cleared if you do backward.

then how do you clear the states during evaluation? My evaluation function is below:

def evaluate(self, data):
    correct = 0
    total = 0
    loader = self.train_loader if data == "train" else self.test_loader
    for step, (story, question, answer) in enumerate(loader):
        story = Variable(story)
        question = Variable(question)
        answer = Variable(answer)
        _, answer = torch.max(answer, 1)

        if self.config.cuda:
            story = story.cuda()
            question = question.cuda()
            answer = answer.cuda()

        pred_prob = self.mem_n2n(story, question)[0]
        _, output_max_index = torch.max(pred_prob, 1)
        toadd = (answer == output_max_index).float().sum().data[0]
        correct = correct + toadd
        total = total + captions.size(0)

    acc = correct / total
    return acc