How are the outputs of each layer of multi-layer bidirectional RNNs fed to the subsequent layer?

Working with multi-layer BLSTMs for voice activity detection this question occurred to me. There are two underlying considerations:
First, in an unidirectional multi-layer LSTM the input to any layer after the first ones are the hidden states of the previous one.
Second, the final layer of a bidirectional multilayer RNN is putting out a tensor with size (2xhidden_size) with the hidden states of the forward and backward directions combined.
Thus my question: When feeding from one bi-directional layer to the next how are the outputs combined?