I know its really simple but I am new to NN and I am having a lot of difficulties understanding the relation between the math and the PyTorch code. I am trying to replicate a paper that uses attention weights and I need to implement this feed forward neural network with two inputs.
$$ c_i = W_1 tanh(W_2m_i + W_3v_a + b_i$$
$m_i$
is a vector of embeddings of a single word 1 for each in the sentence and $v_a
$ is the embedding vector of
The model parameters are: $$W_1 \in R^{1xd}, W_2 \in R^{dxd}, W_3 \in R^{dxd}, b_1 \in {}$$
The resulting ${c_1, c_2,..., c_N}$
after being passed through a softmax will represent the weight that is given to each word in the sentence
Hi, I guess your model will be like:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
class YourModel(nn.Module):
def __init__(self):
super(YourModel, self).__init__()
self.lin1 = nn.Linear(5, 5)
self.lin2 = nn.Linear(5, 5)
def forward(self, m_i, v_a):
y1 = self.lin1(m_i)
y2 = self.lin2(v_a)
y = F.tanh(y1+y2)
y = F.softmax(y)
return y
Then,
model = YourModel()
m_i = Variable(torch.Tensor([1,2,3,4,5]))
v_a = Variable(torch.Tensor([6,7,8,9,10]))
output = model.forward(m_i, v_a)
you can get output as follows:
Variable containing:
0.0563
0.4156
0.4156
0.0563
0.0563
[torch.FloatTensor of size 5]
1 Like
Thanks Ken, thats a nice example, i got the intuition now!
One quick question, i think i should also pass the the y variable through a linear layer since there is W_1 multiplying it in the original function.
$$ c_i = W_1( tanh(W_2m_i + W_3v_a + b_i )$$
You think this is necesary or just applying the softmax straight away is ok?
def forward(self, m_i, v_a):
y1 = self.lin1(m_i)
y2 = self.lin2(v_a)
y = F.tanh(y1+y2)
y = self.lin3(y)
y = F.softmax(y)
return y
Yes, you need to add lin3
