I’ve been reading up on pytorch and had my mind blown by the shared memory stuff via queues with torch.Tensor and torch.multiprocessing.
In general, I’ve done a lot of numpy array processing using Python’s multiprocessing module, but the pickling of the arrays is not ideal. I’d assume that the same tricks that pytorch is using for Tensors could be carried over to pure numpy arrays? It not, what is it that stands in the way?
Its image processing with really large (10’s of GBs) images. The pattern is usually to have one reader throwing chunks of the image onto a queue, a bunch of workers cranking through them and placing the result on a writer queue, and then a writer writing the result.
The workers are often times calling libraries that don’t release the GIL so you’re stuck with multiprocessing.