Fastest way with GPU to evaluate error of the whole data set that fits in Memory

I want evaluate the error of a model using the whole training set (multiple times). I know that the dataloader class can evaluate this with code something like this:

def evalaute_mdl_data_set(loss,error,net,dataloader,enable_cuda):
    Evaluate the error of the model under some loss and error with a specific data set.
    running_loss,running_error = 0,0
    for i,data in enumerate(dataloader):
        inputs, labels = extract_data(enable_cuda,data,wrap_in_variable=True)
        outputs = net(inputs)
        running_loss += loss(outputs,labels).data[0]
        running_error += error(outputs,labels)
    return running_loss/(i+1),running_error/(i+1)

def extract_data(enable_cuda,data,wrap_in_variable=False):
    inputs, labels = data
    if enable_cuda:
        inputs, labels = inputs.cuda(), labels.cuda()
    if wrap_in_variable:
        inputs, labels = Variable(inputs), Variable(labels)
    return inputs, labels

is it possible to just do it in one go by having everything in GPU from the start and somehow evaluating things?