Getting data in the right form from numpy arrays to RNN

Hi guys, I am pretty much a newbie here. I need to feed a time series to a RNN. Now, I am having problems with getting the data in the right form to feed it to torch.nn.rnn.

My data is structured in the following way: I have two signals, sampled at 10000 Hz, and a target with the same sampling. These are stored in numpy arrays, say x1, x2, y. I have 128 seconds of data, therefore 128*10000 points for each input and for the output. Now, I want to feed the RNN network a 1 second long time series, before computing the gradient and going on to feed the next second.

torch.nn.rnn needs the tensors to have three dimensions, how should I put together the input arrays x1, x2 and reshape them along with the target y? I know it is trivial, but I am a bit confused about it. Say x1, x2, y are [1, 1280000] numpy arrays to begin with.

You should pass the input as [seq_len, batch_size, features] in the default setup or [batch_size, seq_len, features] if you specify batch_first=True.
In your case using the default shape this would be [1280000, batch_size, 2].

1 Like

So what if I want to feed, say, a second and then the next one and than compute the loss and gradients over the batch of 2 seconds? would that mean that batch_size = 2 and seq_len = 1280000/2?

So the batch would contain two samples each of 1 second length?
If so, you would pass [seq_len=1*10000, batch_size=2, features=2].