Bidirection RNN hidden states and output

I wanted to check if my understanding of the BiRNN output is correct.

Following is the code.

print('\nOutput:\n', out)
print("\nForward Outputs: \n", out[:, :, :HIDDEN_SIZE])
print("\nBackward Outpus: \n", out[:, :, HIDDEN_SIZE:])

print('\nHidden:\n', hidden)
print("\nForward Hidden: \n", hidden[:, :, 0])
print("\nBackward Hidden: \n", hidden[:, :, 1])

Following is the output.

Output:
 tensor([[[ 0.0908,  0.5328,  0.6399, -0.4098],
         [ 0.4033,  0.8654,  0.3567, -0.6038],
         [ 0.6768,  0.9732,  0.0298, -0.7668],
         [ 0.8282,  0.9944, -0.2225, -0.8903],
         [ 0.9032,  0.9986, -0.3265, -0.9737]],

        [[ 0.8729,  0.9987, -0.7337, -0.9683],
         [ 0.9645,  0.9999, -0.8254, -0.9868],
         [ 0.9797,  1.0000, -0.8842, -0.9947],
         [ 0.9878,  1.0000, -0.9168, -0.9981],
         [ 0.9926,  1.0000, -0.8730, -0.9998]],

        [[ 0.9890,  1.0000, -0.9665, -0.9997],
         [ 0.9973,  1.0000, -0.9777, -0.9999],
         [ 0.9984,  1.0000, -0.9851, -1.0000],
         [ 0.9990,  1.0000, -0.9899, -1.0000],
         [ 0.9994,  1.0000, -0.9820, -1.0000]],

        [[ 0.9991,  1.0000, -0.9956, -1.0000],
         [ 0.9998,  1.0000, -0.9970, -1.0000],
         [ 0.9999,  1.0000, -0.9980, -1.0000],
         [ 0.9999,  1.0000, -0.9987, -1.0000],
         [ 1.0000,  1.0000, -0.9976, -1.0000]]], grad_fn=<TransposeBackward1>)

Forward Outputs: 
 tensor([[[0.0908, 0.5328],
         [0.4033, 0.8654],
         [0.6768, 0.9732],
         [0.8282, 0.9944],
         [0.9032, 0.9986]],

        [[0.8729, 0.9987],
         [0.9645, 0.9999],
         [0.9797, 1.0000],
         [0.9878, 1.0000],
         [0.9926, 1.0000]],

        [[0.9890, 1.0000],
         [0.9973, 1.0000],
         [0.9984, 1.0000],
         [0.9990, 1.0000],
         [0.9994, 1.0000]],

        [[0.9991, 1.0000],
         [0.9998, 1.0000],
         [0.9999, 1.0000],
         [0.9999, 1.0000],
         [1.0000, 1.0000]]], grad_fn=<SliceBackward>)

Backward Outpus: 
 tensor([[[ 0.6399, -0.4098],
         [ 0.3567, -0.6038],
         [ 0.0298, -0.7668],
         [-0.2225, -0.8903],
         [-0.3265, -0.9737]],

        [[-0.7337, -0.9683],
         [-0.8254, -0.9868],
         [-0.8842, -0.9947],
         [-0.9168, -0.9981],
         [-0.8730, -0.9998]],

        [[-0.9665, -0.9997],
         [-0.9777, -0.9999],
         [-0.9851, -1.0000],
         [-0.9899, -1.0000],
         [-0.9820, -1.0000]],

        [[-0.9956, -1.0000],
         [-0.9970, -1.0000],
         [-0.9980, -1.0000],
         [-0.9987, -1.0000],
         [-0.9976, -1.0000]]], grad_fn=<SliceBackward>)

Hidden:
 tensor([[[ 0.9032,  0.9986],
         [ 0.9926,  1.0000],
         [ 0.9994,  1.0000],
         [ 1.0000,  1.0000]],

        [[ 0.6399, -0.4098],
         [-0.7337, -0.9683],
         [-0.9665, -0.9997],
         [-0.9956, -1.0000]]], grad_fn=<StackBackward>)

Forward Hidden: 
 tensor([[ 0.9032,  0.9926,  0.9994,  1.0000],
        [ 0.6399, -0.7337, -0.9665, -0.9956]], grad_fn=<SelectBackward>)

Backward Hidden: 
 tensor([[ 0.9986,  1.0000,  1.0000,  1.0000],
        [-0.4098, -0.9683, -0.9997, -1.0000]], grad_fn=<SelectBackward>)

Is my understanding of the BiRNN correct?

You might want to have a look at previous post of mine, where I had a similar question.

In general, you need to double-check with the documentation. According to that:

  • hidden.shape = (num_layers*num_directions, batch, hidden_size)
  • layers can be separated using h_n.view(num_layers, num_directions, batch, hidden_size)

So after view() the direction dimension is at index 1. So I would say it should look like this:

hidden = hidden.view(num_layers, 2, batch, hidden_size) # 2 for bidirectional
print("\nForward Hidden: \n", hidden[:, 0, :, :])
print("\nBackward Hidden: \n", hidden[:, 1, :, :])

Similar for the output. According to the documentation:

  • output.shape = (seq_len, batch, num_directions*hidden_size)
  • directions can be separated using output.view(seq_len, batch, num_directions, hidden_size)

Here the direction dimension is at index 2, yielding the following:

output = hidden.view(seq_len, batch, 2, hidden_size) # 2 for bidirectional
print("\nForward Outputs: \n", out[:, :, 0, :])
print("\nBackward Outpus: \n", out[:, :, 1, :])

Your current approach for output might be correct, but first splitting num_directions and hidden_size seems cleaner and easier to understand. Your approach for hidden, however, might be wrong since num_directions is not (part of) the last dimensions of hidden.

I hope that gives at least some pointers.

3 Likes