Timeseries classification (using LSTM-CNN) training loss not decreasing even after increasing model size

Hi,

I have a LSTM-CNN model to train my timeseries data.
My training loss does is not decreasing much, I also tried increasing the size of model but still the train loss does not decrease.

Already done scaling and resampling.

class LSTMNet(nn.Module):

def __init__(self, size, input_shape):
    super(LSTMNet, self).__init__()
    self.lstm = nn.LSTM(input_size=input_shape[-1], hidden_size=size, batch_first=True)
    self.conv1 = nn.Conv1d(input_shape[-1], size, kernel_size=8, padding=4)
    self.bn1 = nn.BatchNorm1d(size)
    self.conv2 = nn.Conv1d(size, size * 2, kernel_size=5, padding=2)
    self.bn2 = nn.BatchNorm1d(size * 2)
    self.conv3 = nn.Conv1d(size * 2, size, kernel_size=3, padding=1)
    self.bn3 = nn.BatchNorm1d(size)
    self.pooling = nn.AdaptiveAvgPool1d(1)
    self.fc = nn.Linear(size * 2, 1)

def forward(self, x):
    x_lstm, _ = self.lstm(x)
    x_conv = self.conv1(x.permute(0, 2, 1))
    x_conv = self.bn1(x_conv)
    x_conv = F.relu(x_conv)
    x_conv = self.conv2(x_conv)
    x_conv = self.bn2(x_conv)
    x_conv = F.relu(x_conv)
    x_conv = self.conv3(x_conv)
    x_conv = self.bn3(x_conv)
    x_conv = F.relu(x_conv)
    x_conv = self.pooling(x_conv)
    x = torch.cat((x_lstm[:, -1, :], x_conv.squeeze()), dim=1)
    x = self.fc(x)
    x = torch.sigmoid(x)
    return x

I have tried values of size from [8,16,32,64]. In all cases the loss is almost same for all values of size.
Can someone please let me know how can I improve the training loss.

Below are loss values for size = 8.

Epoch 2/20
15355/15355 [==============================] - 274s 18ms/step - loss: 0.5384 - val_loss: 0.5737
Epoch 3/20
15355/15355 [==============================] - 274s 18ms/step - loss: 0.5363 - val_loss: 0.5407
Epoch 4/20
15355/15355 [==============================] - 270s 18ms/step - loss: 0.5351 - val_loss: 0.5592
Epoch 5/20
15355/15355 [==============================] - 278s 18ms/step - loss: 0.5343 - val_loss: 0.5519
Epoch 6/20
15355/15355 [==============================] - 291s 19ms/step - loss: 0.5335 - val_loss: 0.5540
Epoch 7/20
15355/15355 [==============================] - 382s 25ms/step - loss: 0.5331 - val_loss: 0.5734
Epoch 8/20
15355/15355 [==============================] - 479s 31ms/step - loss: 0.5327 - val_loss: 0.5495
Epoch 9/20
15355/15355 [==============================] - 432s 28ms/step - loss: 0.5323 - val_loss: 0.5369
Epoch 10/20
15355/15355 [==============================] - 234s 15ms/step - loss: 0.5319 - val_loss: 0.5354
Epoch 11/20
15355/15355 [==============================] - 245s 16ms/step - loss: 0.5316 - val_loss: 0.5340
Epoch 12/20
15355/15355 [==============================] - 276s 18ms/step - loss: 0.5313 - val_loss: 0.5501
Epoch 13/20
15355/15355 [==============================] - 293s 19ms/step - loss: 0.5311 - val_loss: 0.5364
Epoch 14/20
15355/15355 [==============================] - 287s 19ms/step - loss: 0.5308 - val_loss: 0.5518
Epoch 15/20
15355/15355 [==============================] - 266s 17ms/step - loss: 0.5306 - val_loss: 0.5488
Epoch 16/20
15355/15355 [==============================] - 281s 18ms/step - loss: 0.5304 - val_loss: 0.5515
Epoch 17/20
15355/15355 [==============================] - 261s 17ms/step - loss: 0.5302 - val_loss: 0.5446
Epoch 18/20
15355/15355 [==============================] - 344s 22ms/step - loss: 0.5301 - val_loss: 0.5375
Epoch 19/20
15355/15355 [==============================] - 267s 17ms/step - loss: 0.5299 - val_loss: 0.5204
Epoch 20/20
15355/15355 [==============================] - 256s 17ms/step - loss: 0.5297 - val_loss: 0.5351

It is improving, albeit slowly. Have you tried adjusting your learning rate or using other optimizers?

I tried using a callback for learning rate. It did not help.

No idea what you mean by “callback for learning rate”. Can you share your optimizer instantiation and training regime?

X_train = load(open(r"X_train_"+str(idx)+".pkl", 'rb'))
y_train = load(open(r"y_train_"+str(idx)+".pkl", 'rb'))
X_test = load(open(r"X_test.pkl", 'rb'))
y_test = load(open(r"y_test.pkl", 'rb'))

units = [8]

for size in units:
    ip = Input(shape=(X_train.shape[1:]))
    x = CuDNNLSTM(size, return_sequences=False)(ip)
    y = Permute((2, 1))(ip)
    y = Conv1D(size, 8, padding='same', kernel_initializer='he_uniform')(y)
    y = BatchNormalization()(y)
    y = Activation('relu')(y)
    y = Conv1D(size*2, 5, padding='same', kernel_initializer='he_uniform')(y)
    y = BatchNormalization()(y)
    y = Activation('relu')(y)
    y = Conv1D(size, 3, padding='same', kernel_initializer='he_uniform')(y)
    y = BatchNormalization()(y)
    y = Activation('relu')(y)
    y = GlobalAveragePooling1D()(y)
    x = concatenate([x, y])
    out = Dense(1, activation='sigmoid')(x)
    model = Model(ip, out)
    model.compile(optimizer='adam',
             loss=tf.keras.losses.BinaryCrossentropy(),
             metrics=[
             # tf.keras.metrics.Precision(),
             # tf.keras.metrics.Recall()
             ])
    es = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=3)
    mc = ModelCheckpoint("lstm_cnn_resample_"+str(size)+"_"+str(idx)+".h5", monitor='val_loss', mode='min', save_best_only=True)
    model.fit(X_train, y_train, epochs=ep, batch_size=32, validation_data=(X_test, y_test),callbacks=[mc, es])

I am generating the code in pytorch, below is my implementation for Tensorflow.

If you know anything I can apply for this kind of architechture, please let me know.

As you mentioned, that code is all Keras/TensorFlow. Additionally, your optimizer does not have the learning rate specified, so it’s likely using the default.

If you’re debugging a TensorFlow script, here would be the optimal place to do it:

Based on your PyTorch code I assume you are using nn.BCELoss as the criterion and apply torch.sigmoid on the outputs. Could you remove the torch.sigmoid call and use nn.BCEWithLogitsLoss for a better numerical stability and check if this would help training the model?

Hi,
Thanks, by this loss function, loss has become 1/4th.
Do I need to then add the sigmoid after the logits layer in the test model?
Would it work this way?

As far as I know - yes. If you used BCEWithLogits you will have to use the sigmoid (and threshold) while inferencing.

It you want to use the probabilities before applying a threshold to create class predictions you could still use torch.sigmoid (just don’t pass it to nn.BCEWithLogitsLoss).
You could also use the raw logits and map the probability to a logit threshold as explained in this post by @KFrank.