I am going to use 1D convolutions to learn representations of time series data. I plan to use an Encoder-Decoder architecture. The input shape is like: [batch_size, num_features, num_timesteps]; the outputs of the encoder should be like: [batch_size, length]; i.e., I wish to get a fixed-length representation for each sequence. It seems that the length of the outputs depends on the original length of the sequence of the batch. Is there a method that I can define the length of the outputs of the encoder regardless of the length(num_timesteps) of the original sequence?
using mean (a la average pooling) is the usual reduction method for unknown lengths. but by itself it is too simplistic, i.e. it is a linear interpolation invariant to reordering. one possibility is to apply an attention layer first.
Thanks for your reply. Do you mean that I should use an attention over all the time steps to get an attention vector whose shape is [batch_size, 1, num_timesteps] and then multiply this attention vector with another vector whose shape is [batch_size, num_channels, num_timesteps] and then compute the mean along the 2nd dimension (num_timesteps) to get an output vector whose shape is [batch_size, num_channels] ?
Well, frankly, implementation details vary, some being pretty complex. I’d suggest to start from some existing design for a similar problem.
Your attention description sounds right, but it is pretty generic… For encoder, it can be a self-attention layer, to support unequal importance of time points. For some tasks, you’d want to encode positions too (and/or time-based features), for same reason as in NLP - to make element order matter.
As for decoders, I’m not too familiar with RNN-less context unrolling for convolutional decoders. I believe another attention module is one way to do it. Again, it may be better to check what they did in relevant research papers.