Can I operate DataLoader asynchronously?

In my network, I have to do a lot of process to transform the pic in DataLoader’s __getitem__, and this makes the training much slower. Now I have tried the num_workers, thanks for God, it helps a lot. However, I still want to accelerate the training speed, so ‘asyncio’ comes to my mind. But, unluckily, I know little about this lib. At the same time, I am not sure whether it is compatible with the num_workers.
Hope you can help me, thanks a lot!

Well, it would make things very complicated and I don’t think they would go hand in hand. Well if your data loading process is becoming an issue, you can save intermediate tensors after applying transforms to disk and directly load them.

currently, num_workers uses Python’s multiprocessing. So in that case, it already runs processes asynchronously to each other (depending how it’s used, you don’t have to though) so using asyncio wouldn’t be of much benefit I assume.

I found it extremely useful to use async for my use case with postgres backed data. So I simply did not use it all. It took me a while to figure out I would be better off because I kept thinking that the built in Dataloader was worth it. There’s a writeup with link to code here: https://medium.com/@jonathan.wickens/using-postgres-as-a-dataloader-with-pytorch-bba0d5cbe1fa I hope it helps someone. Sorry for reviving an old topic but I kept coming back to this post before I found a way forward!