Load Data / Models to a specific CPU

I’m implementing an algorithm which requires a lot of model evaluations. I want to parallelize model evaluations by using CPU. The code segment I want to parallelize is here (simplified for readability):

def test_img(network, datatest):
    network.eval()

    data_loader = DataLoader(datatest, batch_size=args.bs)

    for idx, (data, target) in enumerate(data_loader):
        result = network(data)

And for one evaluation it takes around 1.5s on CPU. The networks for evaluation are different, but the dataset is the same. I tried to use joblib.Parallel to parallelize this process:

results = Parallel(n_jobs=num_cpu, prefer="threads")(delayed(test_img)(network_lst[i], dataset) for i in range(N))

However, it seems that there are no improvement by using this method (it will take 15s for 10 evaluations). I specifically count the runtime for result = network(data) line, and it takes 8 seconds. Therefore I think neither the network evaluation nor the dataloader are parallelized. Are there any way for us to load network and data to a specific CPU core, like data.to('cpu:0')? Is it even possible to use CPU for model evaluation parallelization?

Any suggestions are appreciated!

By default pytorch will use multiple cpu cores to calculate:
image

import time
import multiprocessing as mp
import torch as t

def subproc():
    # keep process showing in "top"
    begin = time.time()
    while time.time() - begin < 10:
        a = t.ones([1000, 1000]) * t.ones([1000, 1000])


if __name__ == "__main__":
    p = mp.Process(target=subproc, args=())
    p2 = mp.Process(target=subproc, args=())
    p.start()
    p2.start()
    print("started")

    p.join()
    p2.join()
    print("joined")

Seems that you are using “threads”, not good for python, you must use processes:

results = Parallel(n_jobs=num_cpu, prefer="threads")(delayed(test_img)(network_lst[i], dataset) for i in range(N))
1 Like