I have two non-equal-length sequence data, one for the input and the other for the target. This data is not nlp data, but multivariate time series data. The data is for predicting values of the next M days(regression) using the features of the past N days.
In this case, I wonder how to handle the data in inferencing
It doesn’t have any token like , so I have no idea how can I prepare the
tgt when calling decoder’s
Need your advice or refenrece code or source
If I am not mistaken the dimension of tokens in your decoder input doesn’t have to be the same as the dim in your encoder input.
In your encoder: X Input_shape = (Num_of_tokens,Batch_size,Embed_size)
In your decoder: Y Input_Shape = (Num_of_tokens_2,Batch_size,Embed_size)
Take a look at MultiheadAttention — PyTorch 1.7.0 documentation
So when data meet you will have Keys and Values coming from the encoder and Queries from the decoder
QK^T will give you a ( Num_of_tokens_2 x Num_of_tokens) matrix and then
Sm(QK^T) * V will give you a (Num_of_tokens_2 x Embed_size) matrix which will be the shape of the output of the decoder after all the layers.
Obs: all the layers conserve the shape of the input.
I would also highly recommend you to code the Transformer from scratch, is not that complicated and it will give you a better understading of the input/output shapes.
Here’s a good reference (but pay attention and check if it corresponds to the paper, there might be a few incoherences with the original model or oversimplifications)
Hope it was somehow helpful.