Pytorch example word embedding takes a very long time for each epoch?

I’m using the example N gram language modelling on my own dataset, which after creating the trigrams, i have 4 million trigrams.

the problem I’m having is its extremely slow, for example just 1 epoch over the 4 million trigram takes 48 hours to complete! is this normal? after moving it to GPU it takes 20 minutes which still is a lot, so is this normal?

I have a NVMe ssd with i7 6700k with 16gb of ram and Nvidia 980 Ti so its not a slow system either