How to ignore specific columns for calculating softmax attention

I am implementing a model which is based on MemoryNetworks. I have triplets data of (context, query, answer). And I want to calculate attention. The attention indicates which sentences in a context should be focused.

To formulate mini-batch, I use zero-paddings to create context data. So the following is attention data. And 0 values show a result of embeddings of zero-padded context.
In such a data, I want to apply softmax to indices 0, 1, 2, 3, last. So the model should ignore zero padding columns.

So how do I realize this? I want to know such a technique when we use zero-padding and attention mechanisms. And I am sorry about this question may be a general deep learning problem.

Before softmax.  torch.bmm(contex, q)
 109.8601
  77.6376
  68.3927
 199.1673
   0.0000
   0.0000
   0.0000
   0.0000
   0.0000
   0.0000
   0.0000
   0.0000
   0.0000
   0.0000
 348.0155
[torch.cuda.FloatTensor of size 15 (GPU 0)]

After softmax. F.softmax( torch.bmm(contex, q) )
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
[torch.cuda.FloatTensor of size 15 (GPU 0)]

On what I could understand from your question, you want to apply softmax to non zero values in a tensor.

Assuming the name of the tensor to be a, you can use,
a = a - torch.where(a > 0, torch.zeros_like(a), torch.ones_like(a) * float('inf'))
This will make every zero value -inf, applying softmax over which will give zero.

1 Like