Hey. I ran into the same problem, loading attribute labels from large(1 GB+) numpy array saved as .npy on the server. It seems that I cannot load them using multiple workers(num_worker>0 does not work and the code just hang there, waiting).When I set num_worker=0, each 100 iteration costs 100 seconds on loading& pre processing data and 50 seconds for training(forward backward step), which is really slow compared to not loading those numpy labels. I got stuck on this problem and eagerly looking for solutions. Would you give me some advice?
Are you loading this large file once in your __init__?
If so, num_workers=0 should pre-load the data once and just slice it in __getitem__.
Using multiple workers might not be the best idea, as each worker will load the whole dataset, while the __getitem__ should be quite fast compared to the __init__.
Could you post the Dataset implementation?
Maybe using shared arrays might work better?
In general, I want to load attribute label from a large numpy file. This file is loaded in init and the labels are returned in get_item(). I wonder if multiple workers are supported under this situation…
The code for loading the image paths looks alright, although you could also pre-create the lists and just pass it to your Dataset instead of re-creating it in the __init__.
The same applies for attribute_list_path.
Note that the Dataset will be re-created if you are using multiple workers for each epoch, so that each worker will reload the large numpy array.
Just preload it outside of your Dataset and pass it as an argument to it.
Thanks ! That is really a helpful advice and I will try it soon. By the way, do you mean the reason that my code just stuck is because of loading too much data into the memory? Will multi workers help if I preload the numpy array and just pass an argument to my Dataset, will it cause multi-processing resource competition problem if several workers want to read the same data(as reference) ?
Your reply is really helpful and thanks again here
I guess your code might just hang, if the loading takes so long or if you run out of memory.
However, I think I might be mistaken and also the pre-loaded dataset might be copied using multiple workers.
The shared array approach might speed thing up.
As I linked in another ticket, I found that this implementation is lack of vectorisation. When one retrieves data in loader, MyDataset.__getitem__ will be called millions of times. This becomes a bottleneck of my training on GPU. In Keras, we know that larger batch_size will reduce the training time; however here, batch_size will have small effect on the training time due to the loop over the training points. Is there any suggestion to avoid this?
Each worker in your DataLoader will create the next batch in the background by calling __getitem__ to load the corresponding sample.
I’m not sure, if there is a library to load e.g. images in a batch way.
As explained in the other topic: using multiple workers might speed your data loading up, if your hard disk is sufficiently fast.
Have a look at this post for more information.
the error I get is “RuntimeError: output with shape [1, 224, 224] doesn’t match the broadcast shape [3, 224, 224]”
Would you give me some advice of how my transform should be in order to apply it to the converted numpy array?
When comparing the transform you defined and the error you got, it seems like your input is only one layer while your transform expects three layers (looking at the normalization-part):
You should change your normalization so that it only has one value for the mean and one for standard deviation. Then the transform should be valid and you can just apply it to your converted numpy array