Pipeline Parallelism (Pipe) Assertion Fail

I am running the torch.distributed.pipeline.sync.Pipe library using pytorch 3.8.1 (also tried nightly). I have 2 visible devices. Below is the example from doc.

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.distributed.pipeline.sync import Pipe
from torchgpipe import GPipe

# Run with Pipe
fc1 = nn.Linear(16, 8).cuda(0)
fc2 = nn.Linear(8, 4).cuda(1)
model = nn.Sequential(fc1, fc2)
model = Pipe(model, chunks=8)
input = torch.rand(16, 16).cuda(0)
output_rref = model(input)

# Run with GPipe
fc1 = nn.Linear(16, 8)
fc2 = nn.Linear(8, 4)
model = nn.Sequential(fc1, fc2)
model = GPipe(model, balance=[1,1], chunks=8)
model = nn.DataParallel(model)
input = torch.rand(16, 16).cuda(0)
output_rref = model(input)
print(output_rref)

I am getting this error:

Traceback (most recent call last):
  File "test.py", line 12, in <module>
    output_rref = model(input)
  File "/usr0/home/ruohongz/anaconda3/envs/py38-pt18/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr0/home/ruohongz/anaconda3/envs/py38-pt18/lib/python3.8/site-packages/torch/distributed/pipeline/sync/pipe.py", line 366, in forward
    return RRef(output)
RuntimeError: agent INTERNAL ASSERT FAILED at "/pytorch/torch/csrc/distributed/rpc/rpc_agent.cpp":247, please report a bug to PyTorch. Current RPC agent is not set!

However, the GPipe code works fine. What is the problem with the pytorch assertion?

You need to initialize the RPC framework, see the latest master docs: Pipeline Parallelism — PyTorch master documentation