utils.data.dataLoader num_workers for custom dataLoader class

Hello, I have a custom dataLoader class that I created. It has some extensive functionality for which I have to use python multiprocessing pool. Now I wonder, if I will able to apply num_workers > 0 for such a dataLoader.
Eg.

#Custom dataLoader class 
#which uses multiprocessing pool with 8 threads
trainData = dataLoader.dataset(
        train=True,
        batchSize=32
    )
# The dataLoader class itself return a 32 batchsize object
#Now creating torch dataLoader wrapper on in
dl = DataLoader(trainData, batch_size=1, num_workers={some_num>0})

Now my question is, if I put num_workers to some non zero value, it doesn’t return an error. But when I try to get a sample out of that

sample = next(iter(dl))

it takes forever to return anything, probably it didn’t work at all.
Is this expected?

My main requirement is, I need to use prefetch_factor in torch DataLoader, which works only when num_workers is not 0

The reason behind putting batchSize=32 for custom class is that the sample generated from the dataLoader is a list of arrays. The individual arrays are not bound to have exact same dimension across the dataSet. Hence I have to vertically stack the arrays from 32 samples and return as one huge array.

The torch DataLoader wrapper does not support variable length data per sample if we use num_workers

Is it possible to put num_workers to a such a custom dataLoader class?

I don’t think this behavior depends on the number of workers, but the default collate_fn would fail to create a single batch using inputs with different shapes.
If you want to apply a custom creation of the batch you could write a custom collate_fn and pass it to the DataLoader.

I don’t think it’s expected to take “forever” so I would guess that the multiprocessing used in the DataLoader and your custom multiprocessing approach might block each other / hang etc.

I shall take a look at creating the custom collate_fn
Regarding taking forever, I also meant that one process is blocking another.
If I don’t use num workers, my dataLoader with 8 threads take around 1.5 second to load the batch. But when I introduced num_workers=1(or any +ve number), even after 5-10 seconds nothing came up.