Thanks all for above replies.
I have read the topic (Couple of models in production) and according it I have implemented these codes:
First Scenario (Sequential Forward Pass):
import torch
import time
from torchvision import models
from torch.autograd import Variable
# Check use GPU or not
use_gpu = torch.cuda.is_available() # use GPU
torch.manual_seed(123)
if use_gpu:
torch.cuda.manual_seed(456)
# Define CNN Models:
model1 = models.resnet18(pretrained=True)
model2 = models.resnet50(pretrained=True)
# Eval Mode:
model1.eval()
model2.eval()
# Put on GPU:
if use_gpu:
model1 = model1.cuda()
model2 = model2.cuda()
# Create tmp Variable:
x = Variable(torch.randn(10, 3, 224, 224))
if use_gpu:
x = x.cuda()
# Forward Pass:
tic1 = time.time()
out1 = model1(x)
out2 = model2(x)
tic2 = time.time()
sequential_forward_pass = tic2 - tic1
print('Time = ', sequential_forward_pass) # example output --> Time = 0.6485
Now I want to perform the forward passes in parallel in just one single GPU.
Second Scenario (Parallel Forward Pass):
import time
import torch
from torchvision import models
import torch.multiprocessing as mp
from torch.autograd import Variable
# Check use GPU or not
use_gpu = torch.cuda.is_available() # use GPU
torch.manual_seed(123)
if use_gpu:
torch.cuda.manual_seed(456)
# Define Forward Pass Method:
def forward_pass_method(model, tmp_variable):
output = model(tmp_variable)
return output
# Define CNN Models:
model1 = models.resnet18(pretrained=True)
model2 = models.resnet50(pretrained=True)
# Eval Mode:
model1.eval()
model2.eval()
# Put on GPU:
if use_gpu:
model1 = model1.cuda()
model2 = model2.cuda()
# Create tmp Variable:
x = Variable(torch.randn(10, 3, 224, 224))
if use_gpu:
x = x.cuda()
# Parallelized the Forward Passes:
tic1 = time.time()
model1.share_memory()
model2.share_memory()
processes = []
num_processes = 2
for i in range(num_processes):
if i == 0:
p = mp.Process(target=forward_pass_method, args=(model1, x))
else:
p = mp.Process(target=forward_pass_method, args=(model2, x))
p.start()
processes.append(p)
for p in processes:
p.join()
tic2 = time.time()
parallel_forward_pass = tic2 - tic1
print('Time = ', parallel_forward_pass)
However the second method has the below error:
...RuntimeError: CUDA error (3): initialization error
Would you please kindly help me to address the error?
However, It is worth nothing that, I am in doubt that parallelizing by just one single GPU is a feasible task or not.