Multi-gpu DDP in Jupyter Notebook

I try to run the example from the DDP tutorial:

import torch
import torch.distributed as dist
import torch.multiprocessing as mp
import torch.nn as nn
import torch.optim as optim
from torch.nn.parallel import DistributedDataParallel as DDP

def example(rank, world_size):
    # create default process group
    dist.init_process_group("nccl", rank=rank, init_method=None, world_size=world_size)
    # create local model
    model = nn.Linear(10, 10).to(rank)
    # construct DDP model
    ddp_model = DDP(model, device_ids=[rank])
    # define loss function and optimizer
    loss_fn = nn.MSELoss()
    optimizer = optim.SGD(ddp_model.parameters(), lr=0.001)

    # forward pass
    outputs = ddp_model(torch.randn(20, 10).to(rank))
    labels = torch.randn(20, 10).to(rank)
    # backward pass
    loss_fn(outputs, labels).backward()
    # update parameters
    optimizer.step()

def main():
    world_size = 2
    mp.spawn(ex,
        args=(world_size,),
        nprocs=world_size,
        join=True)

if __name__ == '__main__':
    main()

I get an error
Exception: process 0 terminated with exit code 1

I am running this in a jupyter notebook inside a docker container.

When I run this as a script inside the container but outside jupyter, it seems it works fine.

What would be the reason it is not working in jupyter?

In general, what is the method to use DDP in a notebook?

Perhaps this thread could help link