In the docs of nn.gru. It writes to compute

n = \tanh(W_{in} x + b_{in} + r * (W_{hn} h + b_{hn})),

where h is first multiplied with W_{hn}, then executed dot product with r.

While in the original paper of GRU. It first computes dot product of h and r, then multiply those with W_{hn}, see Eq. (8).

The two forms are not equivalent. I wonder is that is an error? Or is there any supported material for the former form? Thanks.

Here, * does not denote the dot product, but the elementwise product. Still, the two forms are in fact not equivalent.

For supporting material see this paper (especially the first footnote):

This change from the original GRU allows all of the `W_{...} h`

operations to be performed at once, which improves performance.