I am training 2 models D and C simultaneously on a training dataset. However, instead of using a train loader to create mini-batches in one go at the beginning of each epoch (let’s call it vanilla loading), I am fetching mini-batches at each iteration. This is because I am using a Weighted Random Sampler where the sample weights are being updated at each iteration.
The issue is that I am facing a huge time overhead as compared to vanilla loading. I used torch.cuda.Event to time the function executions (How to measure time in PyTorch) but am not able to figure where the overhead is coming from. Any help will be deeply appreciated.
epoch_tic = torch.cuda.Event(enable_timing=True)
epoch_toc = torch.cuda.Event(enable_timing=True
epoch_tic.record()
for epoch in range(1, epochs+1):
#Other Timer Initialisations
iterations = int(len(train_dataset)/batch_size)+1
iterations_time_tic = torch.cuda.Event(enable_timing=True)
iterations_time_toc = torch.cuda.Event(enable_timing=True)
iterations_time_tic.record()
for iter in range(iterations):
tic = torch.cuda.Event(enable_timing=True)
toc = torch.cuda.Event(enable_timing=True)
tic.record()
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size = batch_size, num_workers = 1, pin_memory = True, sampler=WeightedRandomSampler(sample_weights, batch_size, replacement=True))
toc.record()
torch.cuda.synchronize()
dataload_Time += tic.elapsed_time(toc)/1000
for batch_idx, (data, target, data_idx) in enumerate(train_loader):
#Training Models D and C and time them
#Update Sample Weights at each iteration
........
.........
iterations_time_toc.record()
torch.cuda.synchronize()
iterations_time.append(iterations_time_tic.elapsed_time(iterations_time_toc)/1000)
.............
epoch_toc.record()
torch.cuda.synchronize()
epoch_time.append(epoch_tic.elapsed_time(epoch_toc)/1000)
And these are the execution time values I obtained.
Note 1. All the averaged values shown were averaged across total iterations
Note 2. If I revert back to the vanilla loading, all the numbers add up and you don’t find the overhead in iterations_time as seen here.
Dataloader_time: [0.0663442611694336]
Avg_Dataloader_time: [0.00014145897903930404]
TrainD_time: [8.034557580947876]
TrainC_time: [13.494966745376587]
Avg_TrainD_time: [0.01713125283784195]
Avg_TrainC_time: [0.028773916301442617]
Validation_time: [8.723237037658691]
update_weights_time: [0.00391077995300293]
Avg_update_weights_time: [8.338550006402836e-06]
before_train_time: [0.00018644332885742188]
after_train_before_validation: [0.0005524158477783203]
after_validation_before_test: [0.0006937980651855469]
iterations_time: [74.6968047618866]
iterations_print_time: [1.430511474609375e-06]
Test_time: []
epoch_time: [83.42318558692932]