I had a quick question about making a distributed pytorch application I had built more efficient. Basically, I am implementing a full-torch version of Ape-X (https://arxiv.org/pdf/1803.00933.pdf). In it, the authors implement a shared replay buffer in shared memory with a tensorflow key-value store (Appendix F).
My issue occurs that, since I can’t really specify how much compute each actor uses (I’m assuming one device per each item in world_size), my CPUs just get destroyed with handling all these RPCs, and my hope was to build this system to scale well (independent of the number of actors).
My question–are there ops such as tensorflow’s lookup module for this stuff? Can I put tensors in shared memory? I understand the TCPStore only takes strings in its set() args. Is there a torch recommendation for handling the case where I have centralized data that needs to get read by a learner and added to by a bunch of actors generating data, without requests overload?