Dataloader much slower than manual batching


(rooks) #1

Hi

I was trying to use dataloader to enumerate my training samples but I don’t understand why it is slower than “manual batching”

"Manual batching":

samples_tensor = torch.tensor(samples, dtype=torch.float).cuda()
labels_tensor = torch.tensor(labels, dtype=torch.long).cuda()

for e in range(nbEpochs):
        for b in range(nbSamples // batch_size):
            x = samples_tensor[b * batch_size:(b+1)*batch_size]
            y = labels_tensor[b * batch_size:(b+1)*batch_size]

"With dataloader":

from torch.utils.data import DataLoader
import torch.utils.data as utils

samples_tensor = torch.tensor(samples, dtype=torch.float).cuda()
labels_tensor = torch.tensor(labels, dtype=torch.long).cuda()

dset = utils.TensorDataset(samples_tensor, labels_tensor)
data_train_loader = DataLoader(dset, batch_size=1000, shuffle=True)

for e in range(nbEpochs):
        for _, (x,y) in enumerate(data_train_loader):
            pass

the variant with dataloader is MUCH slower than the manual process. Am I missing something?

Thanks


(Simon Wang) #2

Because one is shuffled and the other one is not.


(rooks) #3

Thanks for your reply.

Even if I manually shuffle the tensors it stays much faster than the dataloader


(Simon Wang) #4

Oh I know why. So for dataloader, since all your dataset give is the __getitem__, what it does is to retrieve a bunch of tensors at different indices, and then cat them. It is done this way so it can be very general and work on any dataset.

However, in case of TensorDataset, you have all data in memory, and can do much more efficiently. You may

  1. Shuffle the entire tensor before hand, and then do contiguous slicing
  2. Or slower, but still faster than the DataLoader way, use index_select (or advanced indexing).

My guess is that if your code mimic the DataLoader behavior, they will be of similar speed.