CPU slower than GPU on small networks

My CPU is always slower than the GPU on the training of 10 units RNN. The dataset is a timeseries with both input and output dimension 1. No matter how many timesteps I choose for training, 200 or 4000, CPU is always slower.
Is it normal? Am I missing any library to install?

CPU: i5 4690k
GPU: GTX 1060

10-unit RNNs are quite small, it seems weird that CPU is slower.


OMP_NUM_THREADS=1 MKL_NUM_THREADS=1 python [yourscript.py]

@smth Still the same.