How to cache an entire dataset in multiprocessing?

This thread has some links that captured the solutions well. Currently the DistributedSampler code still creates Python lists that will copy on read and cause big memory usage.