Hello,
I want to modify this code examples/main.py at master · pytorch/examples · GitHub which has 1 Agent and N observers which interact with an environment at the same time through torch.distributed.rpc.
My goal is to obtain N agents and 1 Simulator and the Simulator would ask the agents to sample actions and update when required.
For example to select action:
def select_actions_all_agents(self, state):
self.current_actions = np.zeros(self.current_actions.shape, dtype=np.int32) * -1000
futs = []
start_time = time.time()
for ag_rreff in self.ag_rrefs:
# make async RPC to kick off an episode on all observers
futs.append(
rpc_async(
ag_rreff.owner(),
_call_method,
args=(Agent.select_action, ag_rreff, self.sim_rref, state)
)
)
# wait until all agents have finished selecting action
for fut in futs:
fut.wait()
self.time_select_action += (time.time() - start_time)
self.num_time_select_action += 1
However, it seems that it does not reduce the inference time.
When instantiating each agent I send it inside the same GPU.
class Agent:
def __init__(self):
self.id = rpc.get_worker_info().id
self.device = ("cuda" if torch.cuda.is_available() else "cpu")
torch.manual_seed(args.seed+self.id)
self.policy = Policy()
self.policy.to(self.device)
I would expect that each agent would operate in parallel in the GPU and the inference time greatly reduced.
Any ideas?