Hi all
I’m trying to implement a paper that uses attention networks on a CNN and am a bit lost.
the paper says the attention block is:
attention (w/ 5x1 conv 8 filters + BN + tanh + 5x1 conv 1 filter + BN + softmax)
The input to this block (I think) has dimensions batches x 1 (filter) x length
Can someone explain how the attention aspect works and if possible some (pseudo) code?
Thanks