Training using Dataloader; running time doesn't match

Hi; I’m training a model on CeleberityA dataset as follows

transform = transforms.Compose([
        transforms.Resize((64, 64)),
        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
    trainset = torchvision.datasets.CelebA(root='./data', split="train",
                                           download=True, transform=transform)
    train_loader =, batch_size=args.batch_size,
                                              shuffle=True, num_workers=128)

    testset = torchvision.datasets.CelebA(root='./data', split="test",
                                          download=True, transform=transform)
    test_loader =, batch_size=args.batch_size,
                                             shuffle=False, num_workers=128)

Now my code looks like this

for epoch in range(total_epochs):


def Train():

   epoch_start_time = time.time() # track how long time it take for this entire epoch
   total_time_used = 0
    for batch_idx, (data, _) in enumerate(train_loader):

        batch_start_time = time.time() # track how long time it take for one batch
        start_time = time.time()
        data = # device is a cuda
        loss = compute_loss(data)

        total_time_used += time.time() - batch_start_time # add the time spent for this batch

    print(time.time() - epoch_start_time)) # this gives the entire epoch's run time

However, I’ve found that total_time_used != time.time() - epoch_start_time) different in around 30s

I’m wondering why is this the case ?

Nope , not seeing any problem in your code , maybe you should buy a better cpu ,better GPU. And it will be very fast .

Hi; I’ve checked the code with time.time() function and updated the post to indicate that there seems some time mismatched. Do you mind taking a look ? thanks

The first execution of the dataloader loop (for data in loader) won’t be recorded in total_time_used.
This would correspond to the initial loading all a batch using all specified workers.