FSDP with Single Node & Multi-GPU

What’s missing from the following example code? When I follow this, I get a runtime error: RuntimeError: “Default process group has not been initialized, please make sure to call init_process_group.”

import torch
from torch.distributed.fsdp import FullyShardedDataParallel as FSDP
torch.cuda.set_device(device_id)
sharded_module = FSDP(my_module)
optim = torch.optim.Adam(sharded_module.parameters(), lr=0.0001)
x = sharded_module(x, y=3, z=torch.Tensor([1]))
loss = x.sum()
loss.backward()
optim.step()

Based on the FSDP tutorial you are missing the distributed setup call.

1 Like

Thanks for the pointer!