Load DDP model trained with 8 gpus on only 2 gpus?

@ mrshenli thanks again. I will try to answer all your inquiries with more detail in a bit today.

Unfortunately, I could not use your script as is because my already saved DDP (without “.module”) was already saved using a state_dict method.
So as for the minor changes, I did the following. :

def main(args):
    torch.distributed.init_process_group(backend='nccl', init_method='env://')
    test_loader = DataLoader(
        test_dataset,
        batch_size=args.test_batch_size,
        shuffle=False,
        num_workers=args.num_workers,
        pin_memory=True)

    model = get_model()
#############################################################
   # My changes
    torch.cuda.set_device(args.local_rank)
    model = model.to([args.local_rank][0])
    model = DDP(model, device_ids=[args.local_rank], 
output_device=[args.local_rank][0])
    checkpoint = torch.load(args.load_path)  # , map_location=map_location)
    state_dict = checkpoint['model_state_dict']
    model.load_state_dict(state_dict)
##############################################################
    dist.barrier()
    test_function(model, test_loader, args.local_rank,args.load_path.with_suffix('.csv'))

I trained resnet18 from scratch. I just copied and used the resnet script locally.

As for your last two comments I did use just rank 0 to save the ddp, but I saved the state_dict() for ddp itself (without .module). That is why when I used your script I also had to remove the .module similar to this:
[solved] KeyError: ‘unexpected key “module.encoder.embedding.weight” in state_dict’
Is it correctly to do so?