I am trying to speedup data loading. Here is a common flow I have found:
from torch.utils.data import DataLoader
# some code
loader = DataLoader(your_dataset, ..., pin_memory=True)
data_iter = iter(loader)
next_batch = data_iter.next() # start loading the first batch
next_batch = [ _.cuda(non_blocking=True) for _ in nex_batch ] # with pin_memory=True and async=True, this will copy data to GPU non blockingly
for i in range(len(loader)):
batch = next_batch
if i + 2 != len(loader):
# start copying data of next batch
next_batch = data_iter.next()
next_batch = [ _.cuda(non_blocking=True) for _ in next_batch]
Is there a better way of accomplishing this? It just seems like it could be done more elegantly.