I want to make a logging buffer in shared memory so I can batch my writes to the filesystem, since it occurs as a network call when I run my setup in kubernetes.
I’m trying to use
multiprocessing.Semaphore, but when I create it in the local rank 0 process and broadcast it via
torch.distributed.broadcast_object_list, I get a
RuntimeError that mutexes must be created on the parent process. Creating a mutexes from a
multiprocessing.Manager doesn’t work either as it fails some kind of authorization check.
How can I create mutexes for a shared resource using what is available in