What is the difference between rank and local-rank?

Hi @AlexLuya,

In the context of multi-node training, you have:

  • local_rank, the rank of the process on the local machine.
  • rank, the rank of the process in the network.

To illustrate that, let;s say you have 2 nodes (machines) with 2 GPU each, you will have a total of 4 processes (p1…p4):

            |    Node1  |   Node2    |
____________| p1 |  p2  |  p3  |  p4 |
local_rank  | 0  |   1  |  0   |   1 |
rank        | 0  |   1  |  2   |   4 |
18 Likes