Hello,

I was recently doing some data processing by Torch and met an interesting problem. The whole data is only a 300 * 3 tensor，and I applied some matrix multiplication(dot product, torch.cross…etc.) on this data tensor.

I tried on both CPU and GPU and find the comsumed time of them are almost the same(GPU is slower even, I think it is because I need to move the data from CPU to GPU).

So, I am very curious about that when will the GPU faster than CPU.I readed some concepts about paralle computation, but haven’t figure it out. Does paralle computation means that the GPU only parallelly process tensors in one batch, and it won’t parallely do the broadcasting computation？ So only when my data tensor with shape: batch_size * 300 * 3, where batch_size > 1, the GPU computation will be faster than CPU?

Many thanks!