Time for data going through model

Hi, I found an interesting thing, and I am not sure whether it is my mistake or not.
When training a model, in one epoch for many mini-batches, the time for images going through model of the first mini-batch is 2.54s, and then 0s for the next 10 mini-batches(i did not see the time for all 851 mini-batches, only 10). Please see the code:

    for batch_idx, (imgs, ...) in enumerate(train_loader):
        optimizer.zero_grad()
        image = imgs.to(device)
        time1_start = time.time()
        x, y= model(image)
        time1_end = time.time()
        print('time1:%.2f' % (time1_end - time1_start))
        # the time for printing is 2.54s, 0.00, 0.00, 0.00, ......

So it means after the first mini-batch, the time for images going through model is nearly 0.
As a student, I am curious and want to learn the reason.
Thanks.

Your profiling is most likely wrong. E.g. if you are using a GPU, you would need to synchronize the code before stopping and starting host timers since CUDA operations are executed asynchronously.

1 Like