Is it feasible to change the positional encoding in BERT to a relative embedding? I intend to train bert for time series prediction but during the self-pretraining stage, I want use a long sequence classification, e.g. 365 time steps to capture time varrying seasonalities in a time series, but later during fine tuning, I want to do a short sequence classification e.g., 30 time steps at a time. So, in this case using the relative positional encoding will make more sense but am not sure if this is logical in BERT model.
All positional encoding strategies have the same purpose: injecting positional information into the model. It’s just that relative positional encodings work often better for text task since the relative position between words (order + distance) is more informative than their absolute position.
It seems this might be beneficial to you as well. I’m just not sure how BERT fits your task. BERT is define be the 2 learning objectives Masked Language Model (MLM) where ~15% of input tokens are masked and then predicted, and Next Sentence Prediction (NSP) where the prediction is True or False depending if the 2 given sentences are adjacent in the corpus or not.
Or are you generally referring to an encoder-only model?
Thanks for your reply. If I understand correctly, I can use any positional encoding in BERT. Note, I work with satellite data which is noisy and with gaps, so I thought BERT will be more suitable compared to the other encoder only models. The self supervised pretraining will predict on the 15% masked data, and later fine tune on a small labelled data. I wanted to use relative dates because the pretrained long sequence classification is later adapted to perform sequence of sequence classification during fine tuning. Plus, I need information localisation so that I could at least get relative date prediction.
I concur with previous response that you shouldn’t replace position embedding as positional embedding is what gives position of each token since transformers, all tokens get dot product with each other and you will loose the position of token and in NLP, position of word matters like ‘cat sat on the mat’ is quite different from ‘mat sat on the cat’. so I wouldn’t replace positional embedding with some other embedding