My pytorch program is working in a slow way. I determined the bottleneck through this code:
from timeit import default_timer as timer t1=timer() xxx(some codes here) t2=timer() print(t2-t1)
After running this code, I found the bottleneck:
for batch_idx, (x1, x2, y) in enumerate(train_loader.get_augmented_iterator(model.training)): x1 = torch.Tensor(x1).to(device) # slow in this line of code x1 = x1.transpose(1, 3)
Where get_augmented_iterator is some function to load the data I defined. In the first line, “x1 = torch.Tensor(x1).to(device)”, my program takes ~0.4s to execute this. The get_augmented_iterator() takes about ~0.1s to finish and includes some steps I have to preprocess it in this stage.
While the problem is certainly not in this line itself, I researched a little bit and found that if I add torch.cuda.empty_cache(), this line execute in normal time. However, torch.cuda.empty_cache() itself takes ~0.4s to execute, so I didn’t really solve this problem, but cache itself must be the problem.
I tried on another several projects where very similar dataloader were used, However, I didn’t reproduce my problem on their codes. So my question is: how can it possibly relate to any errors in my code and how can I solve it?