Parallel Gpu training - Multiple models, Same input data

z000 · July 29, 2019, 12:34pm

Hi all!
Is the following pipeline suitable to train at the same time two models in two different GPUs, but using the same input data to optimize the RAM usage?
Or are there any “hidden” issues or data sharing/concurrency I am unaware of?

elements = [list of input data loaded in RAM memory]

customDataset_valid = CustomDataset(elements[valid_from: valid_to])
customDataset_test = CustomDataset(elements[test_from: test_to])
customDataset_train = CustomDataset(elements[train_from: train_to])

loader_valid_1 = torch.utils.data.DataLoader(dataset=customDataset_valid, batch_size=BS, num_workers=0)
loader_test_1 = torch.utils.data.DataLoader(dataset=customDataset_test, num_workers=0)
loader_train_1 = torch.utils.data.DataLoader(dataset=customDataset_train, batch_size=BS num_workers=0)

loader_valid_2 = torch.utils.data.DataLoader(dataset=customDataset_valid, batch_size=BS, num_workers=0)
loader_test_2 = torch.utils.data.DataLoader(dataset=customDataset_test, num_workers=0)
loader_train_2 = torch.utils.data.DataLoader(dataset=customDataset_train, batch_size=BS num_workers=0)

net1 = NET_A()
net1 = net1.to(device_1, dtype=torch.float)
criterion1 = nn.CrossEntropyLoss().to(device_1, dtype=torch.float))
optimizer1 = optim.SGD(net1.parameters())
thread1 = threading.Thread(target=my_train, args=(device_1, criterion1, optimizer1, net1, loader_train_1, loader_valid_1))

net2 = NET_B()
net2 = net2.to(device_2, dtype=torch.float)
criterion2 = nn.CrossEntropyLoss().to(device_2, dtype=torch.float))
optimizer2 = optim.SGD(net2.parameters())
thread2 = threading.Thread(target=my_train, args=(device_2, criterion2, optimizer2, net2, loader_train_2, loader_valid_2))

thread1.start()
thread2.start()

thread1.join()
thread2.join()

my_train is a function that simply performs the training operations using the given arguments.

Thanks.