Run pytorch on jupyter notebook


I try to run example from tutorial with “GLoo” backend and Point to Point communication.

#!/usr/bin/env python
import os
import torch
import torch.distributed as dist
from torch.multiprocessing import Process

def run(rank, size):
    tensor = torch.zeros(1)
    if rank == 0:
        tensor += 1
        # Send the tensor to process 1
        dist.send(tensor=tensor, dst=1)
        # Receive tensor from process 0
        dist.recv(tensor=tensor, src=0)
    print('Rank ', rank, ' has data ', tensor[0])

def init_process(rank, size, fn, backend='gloo'):
    """ Initialize the distributed environment. """
    os.environ['MASTER_ADDR'] = ''
    os.environ['MASTER_PORT'] = '29500'
    dist.init_process_group(backend, rank=rank, world_size=size)
    fn(rank, size)

if __name__ == "__main__":
    size = 2
    processes = []
    for rank in range(size):
        p = Process(target=init_process, args=(rank, size, run))

    for p in processes:

When I run it, only “done” is printed on jupyter notebook.
How to run it with python?

I tried this with colab, but cannot reproduce this problem. Sometimes there are weird behavior when using multiprocessing in notebook. If you directly launch this program using command line, are the outputs as expected?

Yes. It work with python.
But I wanna ask about run it on jupyter.
How do it work on Jupyter? This is my question.

As I cannot reproduce the error on my Jupyter notebook, I can only guess why the message from subprocess is not shown. Given that the main process prints “done”, I would assume the sub-processes are launched correctly. But since the subprocess didn’t print the message, it could be either 1) sub-process crashed 2) sub-process is not printing to stdout. For 1), you can check the exitcode of the subprocess, adding more logs will also help. For 2) you will need check local configures to see if it is redirected, or you explicitly redirect that print to file.

1 Like

Thank you so much!

Hi all,

I find the solution for that.
I run jupyter on macbook, and It worked.
On Window, the program only printed “done”.


PyTorch distributed package does not support Windows yet. So most likely the subprocess crashed as init_process_group is not available on Windows.