Hi !
I’m trying to use dot-product attention for a time series prediciton task (not language related) and I am wondering if anyone has a good explanation on what each value represents and how to interpret them, since most sources are related to NLP and the transformer architecture.