I am implementing a model which is based on MemoryNetworks. I have triplets data of (context, query, answer)
. And I want to calculate attention. The attention indicates which sentences in a context should be focused.
To formulate mini-batch, I use zero-paddings to create context
data. So the following is attention data. And 0 values show a result of embeddings of zero-padded context.
In such a data, I want to apply softmax to indices 0, 1, 2, 3, last
. So the model should ignore zero padding columns.
So how do I realize this? I want to know such a technique when we use zero-padding and attention mechanisms. And I am sorry about this question may be a general deep learning problem.
Before softmax. torch.bmm(contex, q)
109.8601
77.6376
68.3927
199.1673
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
348.0155
[torch.cuda.FloatTensor of size 15 (GPU 0)]
After softmax. F.softmax( torch.bmm(contex, q) )
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
[torch.cuda.FloatTensor of size 15 (GPU 0)]