How to cache an entire dataset in multiprocessing?

jrcavani · October 4, 2023, 3:11am

This thread has some links that captured the solutions well. Currently the DistributedSampler code still creates Python lists that will copy on read and cause big memory usage.