DistributedDataParallel: using torch.cuda

Can anyone please help to resolve this error

I have the model defined as

model = DDP(model, device_ids=[torch.distributed.get_rank() % torch.cuda.device_count()])

ValueError: DistributedDataParallel device_ids and output_device arguments only work with single-device/multiple-device GPU modules or CPU modules, but got device_ids [0], output_device None, and module parameters {device(type=‘cpu’)}

Did you try to move the model.to(rank) as it seems it’s still stored on the CPU?

Thanks for the response

Yes I added this line instead

model = model.to(‘cuda’)

before wrapping the DDP around it which fixed the error