Do operations like torch.bmm and torch.ger faster on gpu rather than cpu?

I need to do some operations on tensors like torch.bmm, torch.eig and torch.ger. Somehow if i convert the tensors to cpu and do the calculations, it works faster. Is it expected?

There are generally a set of cases where running on CPU is faster than running on GPU: one common case where this happens if the input sizes are small. How large are your inputs?

I am doing some operations on batch size 100 and tensors of size (784, 300), (300, 100), (100, 10). but mostly i work on the average overall the batches. so i use torch.mean to take the average of tensors on the batch and then use torch.bmm and torch.ger and torch.eig on tensors of size (784, 300), (300, 100), (100, 10).