Possible way to limit IO reading?

Is there a way to limit the io reading with DataLoader?

I’m working with a IO intensive training, when I set num_workers=4 and all the machine gets stuck.

Well, if you have a small enough dataset and large enough amounts of RAM, you can write your own dataset class which loads your data in RAM. That way you only read the data from the hard drive(s) once at the beginning. That is what I did in one of my project and it worked great.

Thanks! Any instructions or lines of code for loading data to RAM?

For instance, if you load images with PIL, instead of calling Image.open(path) in the getitem method, you can create a dict in the init of the dataset with the Image objects corresponding to your images stored at the corresponding path keys and retrieve them in the getitem method, if it makes sense.

1 Like

This is definitely one way to solve. I’m also curious about if it is possible to explicitly limit the IO reading in DataLoader or somewhere using Pytorch.