I have a data sequence a which is of shape [seq_len, 2], seq_len is the length of the sequence. There is time correlation among elements of a[:, 0] and a[:, 1], but a[:, 0] and a[:, 1] are independent of each other. For training I prepare data of shape [batch_size, seq_len, 2]. The initialization of BRNN that I use is

input_size – The number of expected features in the input x

hidden_size – The number of features in the hidden state h

What does “number of expected features” mean? Since there is correlation along the seq_len axis should my input_size be set as seq_len and the input be permuted? Thanks.

input_size or “no. of expected features” denotes the dimensionality of each observation; in this case, 2.

Also, your input to nn.RNN should be in the shape of [seq_len, batch_size, input_size]. At every timestep, the RNN receives a [t, :, :] matrix that contains all the observations at timestep t from all the batches.