The main reason for that is usually that you are working with such small conv layer and so few data that the overhead of launching the job on the GPU is higher than the computation itself.
If you have a very small net with small inputs, you will see no speedup from using GPUs.