Why very slow prediction on CPU compared to keras or GPU

I am getting very very slow performance from pytorch prediction on CPU.

90 minutes - keras/tensorflow on 72 processors
<60 minutes - pytorch on GPU
<60 minutes - keras/tensorflow GPU
11 hours - pytorch on 72 processors

I read somewhere pytorch was a little slower on cpu but was not expecting it to be so extreme. Is there a magic formula for using pytorch in CPU?

I load the model with:
torch.load(modelpath, map_location=lambda storage, loc: storage)
To predict I call model(chunk) on chunks that have 5 images.

I had OMP_THREAD_LIMIT=1 but have unset this and it made no difference.

Just tried this on a smaller job on my laptop with 4 processors.

25 minutes keras. cpus are 85-100% utilised.
40 minutes pytorch. cpus are 45-55% utilised.

Still a noticeable gap but not as big. So as the number of CPUs increases what happens? Is this something that can be resolved or is pytorch not suitable for big data predictions on cpus?