CNN-LSTM performance identical to LSTM

I am trying to recreate the models from a study in which CNN-LSTM outperformed LSTM, but my CNN-LSTM produces nearly identical results to the LSTM. So it seems like the addition of the convolutional layers is not doing anything. The study describes the CNN-LSTM model like this:

The model is constructed by a single LSTM layer and two CNN layers. To form the CNN part, two 1D convolutional neural networks are stacked without any pooling layer. The second CNN layer is followed by a Rectified Linear Unit (ReLU) activation function. Each of the flattened output of the CNN’s ReLU layer and the LSTM layer is projected to the same dimension using a fully connected layer. Finally, a dropout layer is placed before the output layer.

Did I make a mistake in the implementation? The results of my CNN-LSTM are almost exactly the same as when I use the LSTM on its own. The LSTM on its own is the exact same code as below, just without the two conv1d layers and without the ReLu activation function.

class CNN_LSTM(nn.Module):
    def __init__(self, input_size, seq_len, params, output_size):
        super(CNN_LSTM, self).__init__()
        self.n_hidden = params['lstm_hidden']        # neurons in each lstm layer
        self.seq_len = seq_len                       # length of the input sequence
        self.n_layers = 1                            # nr of recurrent layers in the lstm
        self.n_filters = params['n_filters']         # size of filter in cnn
        self.c1 = nn.Conv1d(in_channels=1, out_channels=params['n_filters'], kernel_size=1, stride=1) 
        self.c2 = nn.Conv1d(in_channels=params['n_filters'], out_channels=1, kernel_size=1, stride=1)
        self.lstm = nn.LSTM(
            input_size=input_size,         # nr of input features
        self.flatten = nn.Flatten()
        self.fc1 = nn.Linear(in_features=seq_len*params['lstm_hidden'],  out_features=params['dense_hidden'])
        self.dropout = nn.Dropout(p=.4)
        self.fc2 = nn.Linear(in_features=params['dense_hidden'], out_features=output_size) # output_size = nr of output features  

    def reset_hidden_state(self):
        self.hidden = (
            torch.zeros(self.n_layers, self.seq_len, self.n_hidden).to(device=device),
            torch.zeros(self.n_layers, self.seq_len, self.n_hidden).to(device=device),

    def forward(self, sequences):
        out = self.c1(sequences.view(len(sequences), 1, -1))
        out = self.c2(out.view(len(out), self.n_filters, -1))
        out = F.relu(out)
        out, self.hidden = self.lstm(
            out.view(len(out), self.seq_len, -1),
        out = self.flatten(out)
        out = self.fc1(out)
        out = self.dropout(out)
        out = self.fc2(out)
        return out

Source for the study I am using.