For some context, I have a set of 37 playlists of 12 tracks long. Each track has been hand-selected in a certain way. Early songs in the playlist are generally more chilled and as the playlist progresses tracks begin to increase in tempo. I decided to commit to a project and build a deep playlist generator.
I am implementing a many-to-many vanilla RNN in PyTorch and am seeking clarity on how to train the RNN one batch/playlist at a time, where each track is then parsed and the model predicts the features of the next track. MAE/L1loss is then used to calculate the loss at each track.
Pictured is a Many-to-many RNN - for this case - each red box is the current track’s features and the opposite blue box is the predicted next track’s features:
The playlist dataset has been processed into a tensor of shape torch.Size([37, 12, 18]), and stride (12, 1, 444)) - 37 playlist, 12 tracks longs with 9 X_features + 9 y_features (18).
For my RNN Class it looks like so:
class RNNEstimator(nn.Module):
def __init__(self, input_size=9, hidden_size=30, output_size=9):
super(RNNEstimator, self).__init__()
self.hidden_size = hidden_size
self.i2h = nn.Linear(input_size + hidden_size, hidden_size)
self.i2o = nn.Linear(input_size + hidden_size, output_size)
def forward(self, inp, hidden):
print("inp", inp.shape)
print("hid", hidden.shape)
combined = torch.cat((inp, hidden), 1)
hidden = self.i2h(combined)
output = self.i2o(combined)
return output, hidden
def initHidden(self):
return torch.zeros(1, self.hidden_size)
This is taken from the PyTorch tutorials page. However, I have adapted the RNN Class to output 9 features rather than a binary classification.
The train_rnn function looks like:
# Model Initiation
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = RNNEstimator(9, 30, 9)
optimizer = optim.Adam(model.parameters(), lr=0.001)
loss_fn = torch.nn.L1Loss()
# Training function for RNN
def train_rnn(model, train_loader, epochs, criterion, optimizer, device):
model.train() # Make sure that the model is in training mode.
# training loop is provided
for epoch in range(1, epochs + 1):
for batch in train_loader:
total_loss = 0
# get data
batch_x = batch[:, :9, :].float().squeeze()
batch_y = batch[:, 9:, :].float()
batch_x = batch_x.to(device)
batch_y = batch_y.to(device)
optimizer.zero_grad()
hidden = model.initHidden()
# For each track in batch/playlist
# TODO: THIS NEEDS WORK
for x, y in zip(batch_x, batch_y):
output, hidden = model(x, hidden)
loss = criterion(output, y)
loss.backward()
optimizer.step()
total_loss += loss.data.item()
if epoch % 10 == 0:
print('Epoch: {}/{}.............'.format(epoch, epochs), end=' ')
print("Loss: {:.4f}".format(loss.item()))
How do I train this model one playlist per batch and in a many-to-many fashion for each track?
The model should parse each track (t) - via the forward method - then output the next track (t+1). The hidden state will reset each playlist given they are independent of one another. Or perhaps there is a better way.
I am getting an error from the cat function like so:
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)