I want to train a bunch of small models on a single GPU in parallel. The models are small enough so that I can easily fit 20 or more on the GPU. Currently I can only run them sequentially leading to an underutilized GPU.
My code looks like this:
def main(): num_models = 20 device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") models = [Model().to(device) for _ in range(num_models)] for model in models: optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9) criterion = nn.CrossEntropyLoss() trainloader, testloader = dataloader() run_training(model, optimizer, criterion, trainloader, testloader, device)
run_training() functions is implemented as follows:
def run_training(model, optimizer, criterion, trainloader, testloader, device): num_epochs = 2 for epoch in range(num_epochs): # loop over the dataset multiple times epoch_loss = 0.0 epoch_counter = 0 for i, data in enumerate(trainloader, 0): inputs, labels = data.to(device), data.to(device) optimizer.zero_grad() outputs = model(inputs) loss = criterion(outputs, labels) loss.backward() optimizer.step() epoch_loss += loss.item() epoch_counter += inputs.size(0) print('epoch %d loss %.3f' % (epoch + 1, epoch_loss / epoch_counter))
What can I do to parallelize the for-loop / to pass all models to the GPU at the same time?