Best way to use convolution over temporal data


I wish to use convolution layers to predict a future state, based on older perceptions. Each perception is a vector of 8 features, and I was thinking I could stack several of those in order to represent the past. The tensor would then go through one or several 1D conv layer and if necessary a FC layer.

However, I’m not sure about the kernel size and other hyper-parameters. Given that the input has a small size, is it wise to have a 8-sized kernel ? This would then make the output tensor of size (batch, filters, 1) and I couldn’t use another conv layer, could I ? But given that I’m not sure about spatial relationship between inputs, would a smaller kernel size make sense ?

Finally, is this approach relevant at all or should I just stick to LSTM ?

Thanks a lot !