How to predict sequences using a vanilla RNN

For some context, I have a set of 37 playlists of 12 tracks long. Each track has been hand-selected in a certain way. Early songs in the playlist are generally more chilled and as the playlist progresses tracks begin to increase in tempo. I decided to commit to a project and build a deep playlist generator.

I am implementing a many-to-many vanilla RNN in PyTorch and am seeking clarity on how to train the RNN one batch/playlist at a time, where each track is then parsed and the model predicts the features of the next track. MAE/L1loss is then used to calculate the loss at each track.

Pictured is a Many-to-many RNN - for this case - each red box is the current track’s features and the opposite blue box is the predicted next track’s features:

Many-to-many RNN

The playlist dataset has been processed into a tensor of shape torch.Size([37, 12, 18]), and stride (12, 1, 444)) - 37 playlist, 12 tracks longs with 9 X_features + 9 y_features (18).

For my RNN Class it looks like so:

class RNNEstimator(nn.Module):
    def __init__(self, input_size=9, hidden_size=30, output_size=9):
        super(RNNEstimator, self).__init__()

        self.hidden_size = hidden_size

        self.i2h = nn.Linear(input_size + hidden_size, hidden_size)
        self.i2o = nn.Linear(input_size + hidden_size, output_size)

    def forward(self, inp, hidden):
        print("inp", inp.shape)
        print("hid", hidden.shape)
        combined =, hidden), 1)
        hidden = self.i2h(combined)
        output = self.i2o(combined)
        return output, hidden

    def initHidden(self):
        return torch.zeros(1, self.hidden_size)

This is taken from the PyTorch tutorials page. However, I have adapted the RNN Class to output 9 features rather than a binary classification.

The train_rnn function looks like:

# Model Initiation
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = RNNEstimator(9, 30, 9)
optimizer = optim.Adam(model.parameters(), lr=0.001)
loss_fn = torch.nn.L1Loss()

# Training function for RNN
def train_rnn(model, train_loader, epochs, criterion, optimizer, device):
    model.train() # Make sure that the model is in training mode.

    # training loop is provided
    for epoch in range(1, epochs + 1):

        for batch in train_loader:
            total_loss = 0
            # get data
            batch_x = batch[:, :9, :].float().squeeze()
            batch_y = batch[:, 9:, :].float()

            batch_x =
            batch_y =

            hidden = model.initHidden()       
            # For each track in batch/playlist
            # TODO: THIS NEEDS WORK
            for x, y in zip(batch_x, batch_y):
                output, hidden = model(x, hidden)
                loss = criterion(output, y)
                total_loss +=
        if epoch % 10 == 0:
            print('Epoch: {}/{}.............'.format(epoch, epochs), end=' ')
            print("Loss: {:.4f}".format(loss.item()))

How do I train this model one playlist per batch and in a many-to-many fashion for each track?

The model should parse each track (t) - via the forward method - then output the next track (t+1). The hidden state will reset each playlist given they are independent of one another. Or perhaps there is a better way.

I am getting an error from the cat function like so:

IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)