Let’s say I have a large in-memory array of byte buffers.
If I use a datapipe like:
dp = IterableWrapper(very_big_array)
dp = dp.sharding_filter()
(I believe) this will deep clone the very_big_array
, since the datapipe must be pickled and sent to each worker process.
Is there anyway to get around this?