GRU Time Series Autoencoder

R.Giskard · April 17, 2020, 1:11am

Hi to all,
Issue:
I’m trying to implement a working GRU Autoencoder (AE) for biosignal time series from Keras to PyTorch without succes.

The model has 2 layers of GRU.
The 1st is bidirectional.
The 2nd is not.
I take the ouput of the 2dn and repeat it “seq_len” times when is passed to the decoder.
The decoder ends with linear layer and relu activation ( samples are normalized [0-1])

I already try:

Use the hidden of the 2d layer and pass it to the decoder and not the output.
Quit and preserve the grad_clip functions
Quit the 1st layer in decoder (gru_dec1)

I welcome any suggestions/advice

Model Keras:
inputs = Input(shape=(t, in_channels))
encoded = Bidirectional(GRU(256,return_sequences=True))(inputs)
encoded = GRU(32)(encoded)
decoded = RepeatVector(Signal_Len)(encoded)
decoded = GRU(32,return_sequences=True)(decoded)
decoded = Bidirectional(layers.GRU(256,return_sequences=True))(decoded)
decoded = TimeDistributed(layers.Dense(in_channels,activation=tf.nn.relu,
bias_initializer = b_init))(decoded)
Optimizer
opt = tf.keras.optimizers.RMSprop()
Loss
model_autoencoder.compile(optimizer=opt, loss=‘mse’, metrics=[‘mae’,‘mse’])

MODEL PYTORCH
Encoder

class EncoderRNN(nn.Module)
    def __init__(self, n_features, latent_dim, hidden_size):
    super(EncoderRNN, self).__init__()

    self.n_features = n_features
    self.hidden_size = hidden_size
    self.latent_dim = latent_dim

    self.gru_enc = nn.GRU(n_features, hidden_size,
                      batch_first = True,dropout=0,
                      bidirectional=True)
    
    self.lat_layer = nn.GRU(hidden_size*2, latent_dim,
                      batch_first = True, dropout=0,
                      bidirectional = False)
def forward(self, x):
    x, _  = self.gru_enc(x)
    x , h = self.lat_layer(x)
      return x[:,-1].unsqueeze(1)

Decoder

class EncoderRNN(nn.Module):
  def __init__(self, seq_len, n_features , latent_dim , hidden_size):
    super(DecoderRNN, self).__init__()

    self.seq_len = seq_len
    self.n_features = n_features
    self.latent_dim = latent_dim
    self.hidden_size = hidden_size

    self.gru_dec1 = nn.GRU(latent_dim, latent_dim,
                     batch_first = True, dropout=0,
                     bidirectional= False)

    self.gru_dec2 = nn.GRU(latent_dim, hidden_size,
                      batch_first = True, dropout=0,
                      bidirectional= True)

    self.output_layer = nn.Linear(self.hidden_size*2, n_features,bias=True)
    self.act = nn.ReLU()

  def forward(self, x):
   x = x.repeat(1,self.seq_len, 1)
   x, _ = self.gru_dec1(x)
   x, _ = self.gru_dec2(x)
     return self.act(self.output_layer(x))

class AERNN(nn.Module):
  def __init__(self, seq_len, n_features, latent_dim , hidden_size):
    super(AERNN, self).__init__()

    self.seq_len = seq_len
    self.encoder = EncoderRNN(n_features, latent_dim, hidden_size).to(device)
    self.decoder = DecoderRNN(seq_len, n_features, latent_dim, hidden_size).to(device)

  def forward(self, x):
    x = self.encoder(x)
    x = self.decoder(x)
     return x

model = AERNN(seq_len, n_features, latent_dim , hidden_size)
model.apply(bias_init)
model = model.to(device)

optimizer = torch.optim.RMSprop(model.parameters(), lr=1e-3)
loss_fn = nn.MSELoss().to(device)

TRAIN

    for epoch in range(100):
        #Train
        model.train()
        train_loss = 0
        for x,_  in train_dl:

            x = x.cuda()
            optimizer.zero_grad()
            recon = model(x)
            loss = loss_fn(recon[:,:,0], x[:,:,0])
            train_loss += loss.item()
            loss.backward()
            torch.nn.utils.clip_grad_norm_(model.parameters(), clip_norm)
            torch.nn.utils.clip_grad_value_(model.parameters(), clip_value)
            
            optimizer.step()

vdw · April 17, 2020, 3:50am

It’s not quite clear what your asking. What is not working? Do you get errors? Does the loss not go down? Are you generally not happy with the accuracy? Is the accuracy worse compared to the Keras model?

When I build my autoencoders, I usually start with the most basic setup, see if it works (no errors, loss goes down, able to overtrain it in a small dataset, etc), and then step by step add complexity to the model and check again each time if still works. Getting no errors is usually the easy part, but that doesn’t mean it’s correct.

I’m not saying the model is wrong, but it’s definitely not the classic RNN-based encoder-decoder model. There the encoder, well, encodes your sequence to some latent representation (typically the last hidden state) which is that the “seed” hidden state for the decoder. The decoder the step by step generates the next output item and next hidden state using the current hidden state. In code that usually involves some loop.

In your code, you copy/repeat the last hidden state (I ignore the linear layer for simplicity) and give that sequence to your decoder GRU. This can’t make sense, since that sequence has the same items at each time step.

You may want to look at my code for an autoencoder and variational autoencoder (VAE). The context is text (NLP), but that doesn’t matter. I essentially started with the basic machine translaten / seq2seq model, only that input sentence and output sentence are the same. And then I just tweaked some stuff. They both train fine, with the VAE inherently much more difficult to train.

R.Giskard · April 17, 2020, 11:23pm

Hi @vdw, thanks for your reply.

Sorry for the lack of details that your ramarked me,

To be more clear:

About the loss: It doesn’t decrese, remains constant

Epoch:0 Patience: 0 Train_Loss: 0.00085809 Val_Loss: 0.00086563

Epoch:1 Patience: 0 Train_Loss: 0.00082885 Val_Loss: 0.00086676

Epoch:2 Patience: 0 Train_Loss: 0.00082884 Val_Loss: 0.00086649

Epoch:3 Patience: 0 Train_Loss: 0.00082886 Val_Loss: 0.00086577

Epoch:4 Patience: 0 Train_Loss: 0.00082882 Val_Loss: 0.00086545

Epoch:5 Patience: 0 Train_Loss: 0.00082893 Val_Loss: 0.00086574

To give more context:

I’m working with bio-signal in a steady state
I decided to use “repeat” thinking that the hole signal could be represented in the output of the encoder (a compressed representation of it). Then, the decoder, though the hiden_state and the last output of the decoder itself, could do the rest.

See below random example of reconstruction with Keras during training epochs (1,20,50,100 and 180 respectively). Its seems to be in order to me.

Anyway … thank you your links, I’ll check it and I’ll continue traying to obtain similar result.

1874×303 32 KB

20874×303 51 KB

50874×303 45.1 KB

100874×303 50.9 KB

180874×303 41.6 KB
!

Epoch:0	Patience: 0	Train_Loss: 0.00085809	Val_Loss: 0.00086563
Epoch:1	Patience: 0	Train_Loss: 0.00082885	Val_Loss: 0.00086676
Epoch:2	Patience: 0	Train_Loss: 0.00082884	Val_Loss: 0.00086649
Epoch:3	Patience: 0	Train_Loss: 0.00082886	Val_Loss: 0.00086577
Epoch:4	Patience: 0	Train_Loss: 0.00082882	Val_Loss: 0.00086545
Epoch:5	Patience: 0	Train_Loss: 0.00082893	Val_Loss: 0.00086574