RNN Encoder-Decoder only predicting values -1 to 1

Nicholas_Wickman1 · April 9, 2020, 1:53am

Hello all. I am trying to train an encoder-decoder network to predict a float value at each timestep given an audio feature. My network seems to be training, but it only predicts down to -1 and my targets go down to -10. I am an amateur machine learning practitioner, just starting to grasp RNNs.

I assume I am making a mistake with my architecture or with my loss function. Can anybody take a look and let me know what I am doing wrong?

Inputs are of shape [1, 40, 121] and targets are of shape [1, 121]

from torch import nn
import torch.nn.functional as F
import torch

device = 'cuda'

input_size = 40
hidden_size = 121
output_size = 1

class EncoderRNN(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(EncoderRNN, self).__init__()
        self.gru = nn.GRU(input_size, hidden_size, batch_first=False)
        self.hidden = torch.zeros(1, bs, hidden_size).to("cuda" if torch.cuda.is_available() else "cpu")

    def forward(self, input, hidden):
        output = input.squeeze(dim=1)
        output = output.view(121, -1, 40)
        output, hidden = self.gru(output, hidden)

        return output, hidden

class DecoderRNN(nn.Module):
    def __init__(self, hidden_size, output_size):
        super(DecoderRNN, self).__init__()
        self.hidden_size = hidden_size

        self.gru = nn.GRU(hidden_size, hidden_size, batch_first=False)
        self.linear = nn.Linear(hidden_size, output_size)

    def forward(self, input, hidden):
        output = F.relu(input)
        output, hidden = self.gru(output, hidden)
        output = self.linear(output[0])

        return output, hidden

class EDRNN(nn.Module):
    def __init__(self, input_size, output_size):
        super().__init__()

        self.enc = EncoderRNN(input_size, hidden_size)
        self.dec = DecoderRNN(hidden_size, output_size)

        self = self.cuda() if CUDA else self

    def forward(self, xb):
        self.zero_grad()
        enc_out, hidden = self.enc(x, self.enc.hidden)
        dec_out, hidden = self.dec(enc_out, hidden)
        return hidden.squeeze(dim=0)

    def decode(self, x): # for inference
        pass

model = EDRNN(input_size, output_size)

criterion = nn.L1Loss(reduction="none")

enc_optim = torch.optim.Adam(model.enc.parameters(), lr = 0.001)

dec_optim = torch.optim.Adam(model.dec.parameters(), lr = 0.001)

for epoch in range(0,100):

    for x, y in train_dl:

        model.zero_grad()

        output = model(x)

        loss = criterion(output, y.squeeze(dim=1))

        loss.sum().backward()

        enc_optim.step()

        dec_optim.step()

print(loss)

vdw · April 9, 2020, 3:08am

A couple of things that strike me as odd:

You do output = output.view(121, -1, 40) and then pushing output through your GRU layer. But you defined you GRU layer with batch_first=False that means the 121 will be interpreted as sequence length. However, 121 seems to be your hidden_size
I’m not sure if using view() is the right way to get the shapes right; have a look a post of mine.

In short, I’m pretty sure you mess up your inputs so the network can’t learn anyting meaningful.

Nicholas_Wickman1 · April 9, 2020, 3:57am

I’ll take a look. Thank you. I’m a bit unsure of my inputs… Right now my sequence length and hidden_size are both 121, which I suppose is likely to lead to confusion more than help the model.

vdw · April 9, 2020, 4:12am

Ah, OK. But hidden_size is not data dependent, and you can set to any other value – I was actually surprised it’s 121 when most people go four round values (e.g., 50, 100, 200) or powers of 2 (e.g., 64, 128, 256, 512).

Either way, output = output.view(121, -1, 40) seem off. It kind of implies that you don’t know your batch_size but have to infer it with -1. This can’t be right :).

Nicholas_Wickman1 · April 9, 2020, 10:04pm

Edit: Deleted for now while I try to work this out on my own

Nicholas_Wickman1 · April 13, 2020, 9:34pm

Thank you for your help Chris I’ve gotten my model to run and train. When there are a few parts you don’t understand, it’s very helpful to know the way one of them should be!

How might I implement delayed-target learning in this kind of model?