Training multiple models on multiple cores in parallel

Im using Optuna for hyperparamter search. Im training my models on the CPU. Im using the Optuna function study.optimize(wrapper, n_trials=trails, n_jobs=10).
But I’m not seeing a performance increase over setting a lower value for n_jobs. Also, my core utilization is around 20% for every core.
I also tried to set n_jobs to one and run the program in parallel from the command line. This does cause all my cores to have 100% utilization but still seems to do not much for speedup.

I’m preloading my train data to ram so data IO is not the bottleneck.
Some performance timings:

1 program call - 50 trails (total = 50) - n_jobs=5 - 2:10min
1 program call - 50 trials (total = 50) - n_jobs=10 - 2:58min
10 program calls - 5 trials (total = 50) - n_jobs=1 - 2:59min

But I need to run 200+ trials for ~100 models. That’s why I would like some parallelization.
Is there a best/recommended way to train n models on n cores in parallel?