Inference while evaluation cost more time than train

When I run my program and notice that it’s so slow. I try to compare the processing time in train and eval mode by this simple script

import time
for  (image, target) in dataloader:
        image, target =,
        b_size = image.size()[0]
        t1 = time.time()
        output = model(image)
        t2 = time.time()
        with torch.no_grad():
            t3 = time.time()
            output = model(image)
            t4 = time.time()
        print('Train:', t2-t1)
        print('Eval:', t4-t3)       

So surprise, except for the first iter, all other iterations, the evaluation mode takes 4 times longer time to process:

Train: 0.029598474502563477
Eval: 0.08679604530334473

I think the training phase should run longer because of the graph, the gradient, the running variable in batch normalization, etc. Any idea why is it?

If you are using a GPU, you would have to synchronize the code before starting and stopping the timer via torch.cuda.synchronize() as the timings would be wrong otherwise.

I see, now I can get the real processing time, thank you