Parralellization in the case of using batch size of 1 with each being a list


Sorry if the title is unclear, formulating a short one for my question is a bit tricky.
I’m working on an already existing project and I’m questioning whether or not the data loading part is running in parallel or there’s inactive workers.
Let’s assume the data is loaded the following way:

class Dataset(
	def __init__(...):
		self.minibatches = [
			... #Load list of minibatch indices with batch size 16
	def __getitem__(self, index):
		return self.minibatches[index]
def custom_collate_fn(minibatch):
    data = []
    for i in range(len(minibatch)):
        data.append( Load(minibatch[i]) )
        ... #code to pad and convert to tensor ...
    return data
train_dataset = Dataset(...)
training_loader =, batch_size = 1, num_workers = 8, collate_fn = custom_collate_fn)

Data loading can’t possibly run in parallel because of that for loop, right? Even though it seems fast with a batch size of 16, we’re losing potential speedup for a big dataset in a whole epoch, right?

Turns out, I had a wrong understanding of how pytorch handles the workers. They work in parallel with each worker handling a separate batch. They don’t fetch samples in parallel to fill a single batch.