How and when will GPU be faster than CPU? Only when the batch_size of the tensor > 1?

I was recently doing some data processing by Torch and met an interesting problem. The whole data is only a 300 * 3 tensor,and I applied some matrix multiplication(dot product, torch.cross…etc.) on this data tensor.

I tried on both CPU and GPU and find the comsumed time of them are almost the same(GPU is slower even, I think it is because I need to move the data from CPU to GPU).

So, I am very curious about that when will the GPU faster than CPU.I readed some concepts about paralle computation, but haven’t figure it out. Does paralle computation means that the GPU only parallelly process tensors in one batch, and it won’t parallely do the broadcasting computation? So only when my data tensor with shape: batch_size * 300 * 3, where batch_size > 1, the GPU computation will be faster than CPU?

Many thanks!

Generally, you might not see an expected speedup on the GPU, if you are using a lot of “small” operations and are thus limited by the memory bandwidth.

That being said, even with a batch size of 1 you would see a speedup, if the computational workload of the model is not tiny.

Could you post your model definition so that we could have a look at it, please?