Hello. When I try to use iter(dataloader) to define an iterator on my dataloader, it is very slow to define. Can anyone give some suggestions? Here is my code:
dataset_train = CellDataset(data_frame=train) # train is a large pandas dataframe already in memory
dataloader_train = DataLoader(dataset_train, batch_size=batchsize, shuffle=True, num_workers=48)
dataiter = iter(dataloader_train) # this line very slow, about 30s
data = next(dataiter) # this line is normal
My case is a little special since my data is not image files in ssd, but a large pandas dataframe in the memory (about 8GB). Also I use large number of workers. If I reduce the number of workers, say 8, then the dataiter = iter(dataloader_train) is faster but the data = next(dataiter) is slower. When I try regular image data in ssd, then everything is normal.
Each worker will create a copy of the Dataset, so if you preload the data, your memory usage should increase a lot, especially if you are using 48 workers.
I’m wondering, why a single worker seems to be slow, since you are only slicing in-memory data.
Could you try to use torch.from_numpy(self.data_frame.iloc[idx].values).float() in your __getitem__ and profile the code again?
Thanks for your response. I use a machine with 48 threads and 256G RAM so I preload the data and set worker=48.
I try your code and get similar result. It seems that worker=0 does not feed gpu enough, lower than 30%. If I use 48 workers, the feed speed is very fast, but the initialization of the iterator is slow. For example, if I use
for i, d in enumerate(dataloader_train):
print(i, d.shape)
on 48 workers, it will take about 30s before print anyting. After that it is very fast since 48 workers start to work. If I use 0 workers, then there is almost no initialization time but print part is slow.
I guess the multiprocessing under the dataloader is not suitable on preloaded very large dataset?
@xnnba@ptrblck i, I am having the same issue: iter(dataloader_train) is very slow. Even using an hdf5 dataset. This issue replicates during training, thus, my GPU usage % is ultra low, ~1%.
How do you store the data (local SSD, network drive, etc.)?
Also, is the first iteration slow or all of them?
Have a look at this post for some background information.
@ptrblck I have SSD, I’m on Windows (I know…), reading hdf5 file is very fast, when I feed it, either, to the map or iterable dataset class, then to the data loader, and then iter-ating it, it’s very slow, then next() calls are all equally very slow too…
Note: For what its worth, running some torchvison.dataset, i.e. MNIST from pytorch *.pt file, it runs pretty fast.
Are you sure you are reading the content of the hdf5 file or are you just initializing it?
Could you post a small code snippet showing, how you open the file and access it?
Thanks for the code snippet!
I think the File() might just open the file handle, but not actually read the data from your disk, which might be performed in the indexing.
I’ve created this dummy code snippet to play around with it a bit:
# Setup
d1 = np.random.random(size = (1000,1000, 100))
hf = h5py.File('data.h5', 'w')
hf.create_dataset('dataset_1', data=d1)
hf.close()
# Load
t0 = time.time()
hf = h5py.File('data.h5', 'r')
t1 = time.time()
print('open took {:.3f}ms'.format((t1-t0)*1000))
t0 = time.time()
n1 = hf.get('dataset_1')
t1 = time.time()
print('get took {:.3f}ms'.format((t1-t0)*1000))
t0 = time.time()
n1 = np.array(n1)
t1 = time.time()
print('reading array took {:.3f}ms'.format((t1-t0)*1000))
t0 = time.time()
data = hf['dataset_1']
t1 = time.time()
print('get took {:.3f}ms'.format((t1-t0)*1000))
nb_iters = 100
t0 = time.time()
for idx in np.random.randint(0, 1000, (nb_iters,)):
x = data[idx]
t1 = time.time()
print('random index takes {:.3f}ms per index'.format((t1 - t0)/nb_iters * 1000))
I’m no expert in using hdf5, but I it seems that the indexing takes a lot more time than the reading of the file, which might point to lazy loading.