I’m using data loader to train a model. Unlike usual data loader, which load one image at a time, and one batch of images at one batch, I want to load a sequence of images and possibly a sequence of files all together. This means that a batch contains num_of_batch*num_of_images_per_batch images. Naively we can use a for loop inside get item function, but python’s for loop is very slow. Is there anyway to improve?
Would it work if you set the batch_size
to num_of_batch*num_of_images
in your DataLoader
and just reshape the batch after the loading, i.e.:
for data, target in loader:
data = data.view(num_of_batch, num_of_images, channels, w, h)
target = ...
In case it’s not possible, I would try to load the files sequentially and see if it’s really the bottleneck of your training procedure.
1 Like