Hello, I have a custom dataLoader class that I created. It has some extensive functionality for which I have to use python multiprocessing pool. Now I wonder, if I will able to apply num_workers
> 0 for such a dataLoader.
Eg.
#Custom dataLoader class
#which uses multiprocessing pool with 8 threads
trainData = dataLoader.dataset(
train=True,
batchSize=32
)
# The dataLoader class itself return a 32 batchsize object
#Now creating torch dataLoader wrapper on in
dl = DataLoader(trainData, batch_size=1, num_workers={some_num>0})
Now my question is, if I put num_workers
to some non zero value, it doesn’t return an error. But when I try to get a sample out of that
sample = next(iter(dl))
it takes forever to return anything, probably it didn’t work at all.
Is this expected?
My main requirement is, I need to use prefetch_factor
in torch DataLoader, which works only when num_workers
is not 0
The reason behind putting batchSize=32
for custom class is that the sample generated from the dataLoader is a list of arrays. The individual arrays are not bound to have exact same dimension across the dataSet. Hence I have to vertically stack the arrays from 32 samples and return as one huge array.
The torch DataLoader wrapper does not support variable length data per sample if we use num_workers
Is it possible to put num_workers
to a such a custom dataLoader class?