Only Single core utilisation when batch size is one for Conv2D

I’m training a image segmentation network where when i give single image as input and run the model in CPU it runs in only one core takes more than 1 minute buts scales in GPU and when i simply multiply two tensors it runs in multiple cores.