Attention networks basics

Hi all

I’m trying to implement a paper that uses attention networks on a CNN and am a bit lost.

the paper says the attention block is:

attention (w/ 5x1 conv 8 filters + BN + tanh + 5x1 conv 1 filter + BN + softmax)

The input to this block (I think) has dimensions batches x 1 (filter) x length

Can someone explain how the attention aspect works and if possible some (pseudo) code?