In the last case, where DataLoader(..., num_workers=0, ...), I observe the intended behaviour: dataset.max_id is updated.
How can I achieve this with e.g. DataLoader(..., num_workers=1, ...)?
I am using this in the context of large image files, where I want to read information from these images only upon calling them for training/validation. Thank you for any hint/advice!
@dsethz When you specify num_workers > 0, multiple child processes are spawned to perform the actual data loading. As a result, the dataset object would be updated in the child processes and not the parent process where you are printing max_id.
One way to get the max_ids from the child processes would be to put it in a multiprocessing queue on the child processes and read from the same queue on the parent proces.
thank you for your feedback. I am not experienced with multiprocessing, but if I understand you correctly, I need a custom data loader in which I adapt _MultiProcessingDataLoaderIter ?