BiLSTM training slower on GPU than CPU

I’m new to Pytorch and I created a typical BiLSTM+CRF sequence labeling model on a typical NER task.

Embedding layer - 100 dim word2vec (Chinese Character).
Hidden dim - 100
LSTMCell - GRU (bidirectional)
dropout - 0.2
optimizer - AdamW
batch size - 128

I’ve tried to train this model on 1080Ti, 2080Ti and my Mac CPU(8 cores). And it turns out Mac CPU training is much faster than than on 1080Ti and 2080Ti. I’m now confused about this, has anyone ever encountered same issue at all?

Speed:
| | CPU | 1080Ti / 2080Ti |
|1 epoch | 3’20" | 18’20" |
| evaluate | 20" | 200" |