Nn.parallel.data_parallel on GRU

Hello,
I want to apply function nn.parallel.data_parallel on GRU, how ever when I am trying to run following code, there is a error:

import torch
import torch.nn as nn
from torch.autograd import Variable

a = nn.GRU(100, 20, 1, batch_first=True)
input_variable = Variable(torch.rand(15, 1, 100).cuda())
hidden_state = Variable(torch.rand(1, 15, 20).cuda())
a.cuda()

output, _ = nn.parallel.data_parallel(a, (input_variable, hidden_state), [0, 1, 2])
Traceback (most recent call last):
  File "test2.py", line 10, in <module>
    output, _ = nn.parallel.data_parallel(a, (input_variable, hidden_state), [0, 1, 2])
  File "/home/didoyang/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 105, in data_parallel
    outputs = parallel_apply(replicas, inputs, module_kwargs, used_device_ids)
  File "/home/didoyang/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/parallel_apply.py", line 67, in parallel_apply
    raise output
RuntimeError: Expected hidden size (1, 5L, 20), got (1L, 15L, 20L)