I was fine-tuning Inception v3 using Colab with a NVIDIA P100 GPU, batch_size = 32 on circa 100K images size 299x299. Each epoch was taking around 8min.
I then acquired some time on GCP. Set up a nice machine with 8xTesla V100. Connected my colab to it using Colab SDK… Then I’ve changed the model to run in parallel as per tutorials. Increased the batch size to 32*8… However training is now much slower even though I can see the program using the 8 gpus trough nvidia-smi
I am using SSD disk. I wonder if I’ll have to change all layers of Inception_v3 and distribute them across GPUs. Or…is there an easier change I can perform in my code below?
Let me know! This is the first time I am trying parallel process with multi gpus.
This is how I build things:
#batch_size
batch_size = 32*8
# Num of workers
num_w = multiprocessing.cpu_count()
data_loaders = {'train': DataLoader(data['train'], batch_size=batch_size, shuffle=True, pin_memory=True, num_workers=num_w),
'val': DataLoader(data['val'], batch_size=batch_size, shuffle=True, pin_memory=True, num_workers=num_w),
'test' : DataLoader(data['test'], batch_size=batch_size, shuffle=True, pin_memory=True, num_workers=num_w)}
(…)
#(.....)
# Download inception
elif model_name == "inception":
""" Inception v3
Be careful, expects (299,299) sized images and has auxiliary output
"""
model_ft = models.inception_v3(pretrained=use_pretrained)
set_parameter_requires_grad(model_ft, feature_extract)
# Handle the auxilary net
num_ftrs = model_ft.AuxLogits.fc.in_features
model_ft.AuxLogits.fc = nn.Linear(num_ftrs, num_classes)
# Handle the primary net
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs,num_classes)
input_size = 299
else:
print("Invalid model name, exiting...")
exit()
return model_ft, input_size
# Initialize the model for this run
model_ft, input_size = initialize_model(model_name, num_classes, feature_extract, use_pretrained=True)
Put in the GPUs
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
if torch.cuda.device_count() > 1:
print("Let's use", torch.cuda.device_count(), "GPUs!")
model_ft = nn.DataParallel(model_ft)
model_ft = model_ft.to(device)