GRU works even with incorrect initial hidden state

In the Pytorch docs for nn.GRU I found this:

h_0 (num_layers * num_directions, batch, hidden_size): tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided.

However, while playing around and learning GRU I came across a case where the GRU’s forward pass succeeds even when the size of initial hidden state is incorrect.
This can be seen in the code below:

Can someone please shed light on this behaviour?

Code:

batch_size = 3
in_dim = 12
hid_dim = 5
timesteps = 7
num_layers = 2
gru = nn.GRU(in_dim, hid_dim, bidirectional=False, batch_first=True, num_layers=num_layers)
inp = Variable(torch.Tensor(batch_size, timesteps, in_dim).normal_(-0.1, 0.1))

print('# Batch size correctly provided as 3')
x = torch.Tensor(num_layers, 3, hid_dim).normal_(-0.1, 0.1) # Batch size correctly provided as 3
hid = Variable(x)
o, h = gru(inp, hid)
print(o.size())
print(h.size())

print('# Batch size wrongly provided as 1')
x = torch.Tensor(num_layers, 1, hid_dim).normal_(-0.1, 0.1) # Batch size wrongly provided as 1
hid = Variable(x)
o, h = gru(inp, hid)
print(o.size())
print(h.size())

print('# Batch size wrongly provided as 2')
x = torch.Tensor(num_layers, 2, hid_dim).normal_(-0.1, 0.1) # Batch size wrongly provided as 2
hid = Variable(x)
o, h = gru(inp, hid)
print(o.size())
print(h.size())

Batch size correctly provided as 3 in initial hidden state

torch.Size([3, 7, 5])
torch.Size([2, 3, 5])

Batch size wrongly provided as 1 in initial hidden state

torch.Size([3, 7, 5])
torch.Size([2, 3, 5])

Batch size wrongly provided as 2 in initial hidden state


RuntimeError Traceback (most recent call last)
in ()
24 x = torch.Tensor(num_layers, 2, hid_dim).normal_(-0.1, 0.1) # Batch size wrongly provided as 2
25 hid = Variable(x)
—> 26 o, h = gru(inp, hid)
27 print(o.size())
28 print(h.size())

~/.local/lib/python3.5/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
323 for hook in self._forward_pre_hooks.values():
324 hook(self, input)
→ 325 result = self.forward(*input, **kwargs)
326 for hook in self._forward_hooks.values():
327 hook_result = hook(self, input, result)

~/.local/lib/python3.5/site-packages/torch/nn/modules/rnn.py in forward(self, input, hx)
167 flat_weight=flat_weight
168 )
→ 169 output, hidden = func(input, self.all_weights, hx)
170 if is_packed:
171 output = PackedSequence(output, batch_sizes)

~/.local/lib/python3.5/site-packages/torch/nn/_functions/rnn.py in forward(input, *fargs, **fkwargs)
383 return hack_onnx_rnn((input,) + fargs, output, args, kwargs)
384 else:
→ 385 return func(input, *fargs, **fkwargs)
386
387 return forward

~/.local/lib/python3.5/site-packages/torch/nn/_functions/rnn.py in forward(input, weight, hidden)
243 input = input.transpose(0, 1)
244
→ 245 nexth, output = func(input, hidden, weight)
246
247 if batch_first and batch_sizes is None:

~/.local/lib/python3.5/site-packages/torch/nn/_functions/rnn.py in forward(input, hidden, weight)
83 l = i * num_directions + j
84
—> 85 hy, output = inner(input, hidden[l], weight[l])
86 next_hidden.append(hy)
87 all_output.append(output)

~/.local/lib/python3.5/site-packages/torch/nn/_functions/rnn.py in forward(input, hidden, weight)
112 steps = range(input.size(0) - 1, -1, -1) if reverse else range(input.size(0))
113 for i in steps:
→ 114 hidden = inner(input[i], hidden, *weight)
115 # hack to handle LSTM
116 output.append(hidden[0] if isinstance(hidden, tuple) else hidden)

~/.local/lib/python3.5/site-packages/torch/nn/_functions/rnn.py in GRUCell(input, hidden, w_ih, w_hh, b_ih, b_hh)
58 h_r, h_i, h_n = gh.chunk(3, 1)
59
—> 60 resetgate = F.sigmoid(i_r + h_r)
61 inputgate = F.sigmoid(i_i + h_i)
62 newgate = F.tanh(i_n + resetgate * h_n)

RuntimeError: The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension 0

This is a bug, and has been fixed on the master branch of pytorch: https://github.com/pytorch/pytorch/pull/3925

1 Like

Thanks @richard :slight_smile:

And I found that this only happens when the GRU(or ther RNN layers) was on cpu, and the hidden states contained a Tensor (instead of a cuda Tensor), and I thought that this was a feature specially available for CPU inplementation.