How to run the calling of F(x1), F(x2), F(x3), F(x4) at a time, and speed up it?

There are 4 submodels x1, x2, x3 and x4 in my training, which are trained separately. In my script, I need to call F(x1,y), F(x2,y), F(x3,y) and F(x4,y) (F is the same function for x1, x2, x3 and x4, and y is a common variable) one by one every time to do some computation and update their weights. But the speed is slow. I wander if there is a way to call F(x1,a), F(x2,a), F(x3,a) and F(x4,a) in a way like multiprocessing for speed-up, since they use the same function F.

If every operation they do is big enough, it will already be parallelized at low level. So you won’t get much benefit of forcing them to run in parallel.

Use this code for reference for parallelism on CPU.

from multiprocessing import Process, Manager

# Common function
def my_func(x, output_dict):
    out = x**2
    output_dict[x] = out

input = [1, 2, 3, 4]

# Create manager for shared variable (output_dict)
manager = Manager()
output_dict = manager.dict()

# Instantiate process with arguments
procs = []
for i in input:
    proc = Process(target=my_func, args=(i, output_dict))
    procs.append(proc)
    proc.start()
    
# Complete the processes
for proc in procs:
    proc.join()

print(output_dict.items())
# [(1, 1), (2, 4), (3, 9), (4, 16)]