Using 2 GPUs for Different Parts of the Model

One approach…

Start 2 python programs, in separate interpreters to avoid the dreaded GIL lock.

Processor 1

  1. Put tensor on cuda:0, get the output.
  2. Serialize and push the output to shared redis database

Processor 2

  1. Consumer picks up from database, pushes to cuda:1
  2. Consumer runs the next step of calculation.

If you need to send gradients for backprop you can store and reload them also.

That’s one way… Not easy though. I spent easy a month just trying to distribute calculations over multiple processors.

If you can pull it off… then it’s an awesome skill.

Also, there is the Ray project GitHub - ray-project/ray: Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

I tried using it. It had great promise, but ended up being a bit too new at the time. It might be a bit more mature now.