Define Num_Workers with Distributed Data Parralel

Hi,
I’m a pytorch beginner. I want to distribute data on several GPUs with Distributed Data Parralel.
I run this code on a 4 GPU node. My wordlsize is 4.
My question is about the validation batch_size and the num_worker (validation and training).

Will each process have a validation batch_size = 4/worldsize and a num_worker = 4/world_size?

Will each process have a training num_worker equal to 16/world_size?



def get_dataset(data_folder,train_images,val_images, CROP_SIZE, UPSCALE_FACTOR):
    world_size = dist.get_world_size()
    
    
    # Create dataset training and validation
    train_set = TrainDatasetFromFolder(data_folder, crop_size=CROP_SIZE, upscale_factor=UPSCALE_FACTOR,
                                       image_list=train_images)
    val_set = ValDatasetFromFolder(data_folder, upscale_factor=UPSCALE_FACTOR, image_list=val_images)

    
    
    
    train_sampler = DistributedSampler(train_set,num_replicas=world_size)
    val_sampler = DistributedSampler(val_set,num_replicas=world_size)
    batch_size = int(80/ float(world_size))
    print(world_size, batch_size)
    train_loader = DataLoader(
        dataset=train_set,
        sampler=train_sampler,
        batch_size=batch_size,
        num_workers=16,
        pin_memory=True,
        
    )
    val_loader = DataLoader(
        dataset=val_set,
        sampler=val_sampler,
        batch_size=4,
        num_workers=4,
        
    )

    return train_loader, val_loader, batch_size

the parameters for DataLoader are not affected by world_size, it’s up to you to control the interaction between them.