Bidirectional GRU ~4x slower than Unidirectional GRU

As the title says, I am noticing a ~4x slow down when I switch from unidirectional to bidirectional GRU. I am using a packet sequence as input.

If I switch to LSTMs the behavior seems more reasonable ~2x slow down.

Why would a bidirectional GRU be 4 times slower than unidirectional?