Hello,
My pytorch program is working in a slow way. I determined the bottleneck through this code:
from timeit import default_timer as timer
t1=timer()
xxx(some codes here)
t2=timer()
print(t2-t1)
After running this code, I found the bottleneck:
for batch_idx, (x1, x2, y) in enumerate(train_loader.get_augmented_iterator(model.training)):
x1 = torch.Tensor(x1).to(device) # slow in this line of code
x1 = x1.transpose(1, 3)
Where get_augmented_iterator is some function to load the data I defined. In the first line, “x1 = torch.Tensor(x1).to(device)”, my program takes ~0.4s to execute this. The get_augmented_iterator() takes about ~0.1s to finish and includes some steps I have to preprocess it in this stage.
While the problem is certainly not in this line itself, I researched a little bit and found that if I add torch.cuda.empty_cache(), this line execute in normal time. However, torch.cuda.empty_cache() itself takes ~0.4s to execute, so I didn’t really solve this problem, but cache itself must be the problem.
I tried on another several projects where very similar dataloader were used, However, I didn’t reproduce my problem on their codes. So my question is: how can it possibly relate to any errors in my code and how can I solve it?