How to do fwd and bwd passes in separate processes? Gradients always return None

I’m trying to modify A3C so that it’s compatible with openai universe envs which cannot be paused and require incessant interaction during training/testing.

I’ve split the fwd and bwd passes for each local agent such that the fwd and bwd passes of each local agent occur in separate processes (e.g. an a3c consisting of 8 agents now consists of 16 processes (a fwd process and bwd process for each agent) as opposed to the usual 8 processes).
This is necessary because the fwd pass must be able to interact with the universe env constantly and doesn’t have time to wait for the bwd pass to finish.

The issue I’m running into is that I don’t know how to send the tape of the fwd pass graph in the fwd process to the bwd process to be used for the bwd pass. As a result, the gradients computed in the bwd process (which currently is just receiving rollouts from the fwd process) are always None because autograd doesn’t realize that the rollout came from fwd process’s fwd pass graph. Do I instead have to rebuild the fwd graph from scratch in the bwd process and hard code its states, actions, & rewards to fwd process’s rollout (like some TF implementations do)? Is there some kind of way to send the fwd tape from the fwd process to the bwd process via mp.Queue or share_memory and use it in bwd process? Also, what are the commands for grabbing the tape of a graph that led to current values of Variables (in my fwd process) and assigning that tape to other Variables (in my bwd process). Are these commands somewhere in the engine or variable source code?

Python pseudocode of what I’m trying to do is attached:

import torch.multiprocessing as mp
from model import MyModel()
import gym, universe

def ensure_shared_grads(model, shared_model):
    for param, shared_param in zip(model.parameters(), shared_model.parameters()):
        if shared_param.grad is not None:
            return
        shared_param._grad = param.grad

def fwd(shared_model, q):
    env = gym.make("universe_env_that_requires_incessant_interaction")
    model = MyModel()
    while True:
        model.load_state_dict(shared_model.state_dict())

        # collect on-policy rollout of model's interaction with env for t steps

        q.put((model, rollout))
        # ^would like this to be q.put((model, rollout, tape_of_fwd_graph_that_produced_rollout)),
        # but I don't know how to get tape of fwd graph

def bwd(shared_model, q):
    while True:
        #↓on-policy rollout sent from fwd interacting with env
        model, rollout = q.get()
        #^would like this to be q.get((model, rollout, tape_of_fwd_graph_that_produced_rollout))  

        # would like to assign tape_of_fwd_graph_that_produced_rollout to variables in rollout,
        # but I don't know what command to use.
        
        loss = compute_loss(rollout)

        optimizer.zero_grad()

        (loss).backward()
        ensure_shared_grads(model, shared_model)
        # ^grads return None here because 
        # autograd thinks loss is some random number
        # and doesn't realize its creator went through a graph in fwd process.
        # I've tried model.share_memory when model is declared 
        # and that doesn't work either.
        # Do I instead have to rebuild the fwd graph from scratch in the bwd
        # process and hard code its states, actions, & rewards to fwd process's 
        # rollout (like some TF implementations do)?
        ''' ^Inside of bwd process, How do I get autograd to compute gradient of loss '''
        ''' with respect to tape of fwd run from fwd process? Is there some kind of way '''
        ''' to send the tape via mp.Queue() or share_memory and use it in bwd process? '''

        optimizer.step()

if __name__ == '__main__':
    num_processes = 4
    shared_model = MyModel()
    # NOTE: this is required for the ``fork`` method to work
    shared_model.share_memory()
    processes = []
    for rank in range(num_processes):
        q = mp.queue(1) #queue is length 1 because we discard off-policy past rollouts for now

        # fwd must send actions incessantly to universe env,
        # so fwd and bwd split into 2 processes so that fwd doesn't have to wait for bwd to finish
        p = mp.Process(target=fwd, args=(shared_model, q))
        p.start()
        processes.append(p)
        p = mp.Process(target=bwd, args=(shared_model, q))
        p.start()
        processes.append(p)
    for p in processes:
      p.join()
1 Like

You cannot do forward and backward passes in separate processes.

However, have you looked at just having the models themselves whole-ly running in separate processes?

For example: Have you looked at our Hogwild example?

Or https://github.com/ikostrikov/pytorch-a3c

1 Like

Hi @ethancaballero,

I know you posted this a long time ago, but did you ever manage to get anything (non Atari) working using OpenAi Universe?

Seems not many folks have had success with OpenAIs Universe - not many papers use/cite it- but would love to hear otherwise?

@AjayTalati I think OpenAI has internally pivoted universe to something similar to synchronous environments from Alex Nichol’s (unixpickle) muniverse (they recently recruited him). They’ll probably release new synchronous version of universe by end of year. The current version of universe seems to be abandoned.

1 Like

Ah that helps! Thanks @ethancaballero

I was wondering whether you’ve ever heard of people using Universe for website testing - just a vague idea I had? Nothing really concrete I’ve thought through, but I guess it could be used for something like testing online experiments before they are actually launched “in to the wild”

A lot of website have chatbots/messenger bots so that’s kind of one of the areas that a universe interface could be used for testing?