The third weight is for the output vector. When RNNs have both outputs and hidden states that are output, there will be a third weight matrix for the y output. That third weight matrix is W_y. The PyTorch doc is specifically for the hidden activation function, which only has two weights in the equation.
On the left, PyTorch docs specify the tanh activation function, while the right equation has a generic activation function.
The right-hand side equation also has only one bias.