I am training a model that does not make full use of the GPU’s compute and memory. Training is carried out over two 2080ti GPUs using Distributed DataParallel.
How can we concurrently train 2 models per GPU (each using different parameters), so that we can more fully utilize the GPs?
The following code currently trains only 1 model across 2 GPUs.
import torch.multiprocessing
import torch.distributed
import torch.nn as nn
def train(gpu, args):
distributed.init_process_group(
backend='nccl',
init_method='env://',
world_size=args['world_size'],
rank=args['nr']*args['gpu']+gpu
...
torch.cuda.set_device(gpu)
model.cuda(gpu)
model = nn.parallel.DistributedDataParallel(model, device_ids=[gpu])
# training loop
for epoch in range(num_epochs):
...
if __name__ == '__main__':
multiprocessing.spawn(train, nprocs=2, args=(args,))