I don’t know what category to put this in, so I hope that people read the uncategorized bucket.
I’m using AllenNLP to do some NLP work. AllenNLP uses pytorch (version 1.9.0 is what I have). I’m using Python 3.7.9 on macOS 11.6.1 (latest version of Big Sur). I’m trying to use multiprocessing (and yes, I’m using
torch.multiprocessing) to distribute some decoding work among multiple subprocesses, using
torch.multiprocessing.Pool. The problem I’m having is that, no matter what I try, the actual processing time (which I’m measuring in the children, after the processes have started up, etc.) is increasing with the number of subprocesses.
I know that pytorch makes use of shared memory, and the shared memory doesn’t seem to have a notion of read-only vs. read-write, and I’m assuming that what’s happening is that the model is being locked so that only one subprocess at a time is able to read from it. But I can’t exactly tell, because I can’t figure out where the sharing is happening. What’s more bizarre to me is that when I load the model in the subprocesses, the processing time still scales with the number of subprocesses. I’ve convinced myself that I’m truly loading the models independently in the subprocesses.
What occurs to me, now, is that perhaps the shared memory implementation works on process groups, and looks at the model, rather than the model instance, to decide whether to use shared memory. But I’m not nearly good enough a C++ programmer to confirm this by reading the source. Can anyone cast any light on this issue?
Thanks in advance.