I know its really simple but I am new to NN and I am having a lot of difficulties understanding the relation between the math and the PyTorch code. I am trying to replicate a paper that uses attention weights and I need to implement this feed forward neural network with two inputs.

```
$$ c_i = W_1 tanh(W_2m_i + W_3v_a + b_i$$
```

`$m_i$`

is a vector of embeddings of a single word 1 for each in the sentence and $`v_a`

$ is the embedding vector of

The model parameters are: `$$W_1 \in R^{1xd}, W_2 \in R^{dxd}, W_3 \in R^{dxd}, b_1 \in {}$$`

The resulting `${c_1, c_2,..., c_N}$`

after being passed through a softmax will represent the weight that is given to each word in the sentence

Hi, I guess your model will be like:

```
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
class YourModel(nn.Module):
def __init__(self):
super(YourModel, self).__init__()
self.lin1 = nn.Linear(5, 5)
self.lin2 = nn.Linear(5, 5)
def forward(self, m_i, v_a):
y1 = self.lin1(m_i)
y2 = self.lin2(v_a)
y = F.tanh(y1+y2)
y = F.softmax(y)
return y
```

Then,

```
model = YourModel()
m_i = Variable(torch.Tensor([1,2,3,4,5]))
v_a = Variable(torch.Tensor([6,7,8,9,10]))
output = model.forward(m_i, v_a)
```

you can get output as follows:

```
Variable containing:
0.0563
0.4156
0.4156
0.0563
0.0563
[torch.FloatTensor of size 5]
```

1 Like

Thanks Ken, thats a nice example, i got the intuition now!

One quick question, i think i should also pass the the y variable through a linear layer since there is W_1 multiplying it in the original function.

$$ c_i = W_1( tanh(W_2m_i + W_3v_a + b_i )$$

You think this is necesary or just applying the softmax straight away is ok?

```
def forward(self, m_i, v_a):
y1 = self.lin1(m_i)
y2 = self.lin2(v_a)
y = F.tanh(y1+y2)
y = self.lin3(y)
y = F.softmax(y)
return y
```

Yes, you need to add `lin3`