I am training a model that does not make full use of the GPU’s compute and memory. Training is carried out over two 2080ti GPUs using Distributed DataParallel.
How can we concurrently train 2 models per GPU (each using different parameters), so that we can more fully utilize the GPs?
The following code currently trains only 1 model across 2 GPUs.
import torch.multiprocessing import torch.distributed import torch.nn as nn def train(gpu, args): distributed.init_process_group( backend='nccl', init_method='env://', world_size=args['world_size'], rank=args['nr']*args['gpu']+gpu ... torch.cuda.set_device(gpu) model.cuda(gpu) model = nn.parallel.DistributedDataParallel(model, device_ids=[gpu]) # training loop for epoch in range(num_epochs): ... if __name__ == '__main__': multiprocessing.spawn(train, nprocs=2, args=(args,))