From math to Pytorch

I know its really simple but I am new to NN and I am having a lot of difficulties understanding the relation between the math and the PyTorch code. I am trying to replicate a paper that uses attention weights and I need to implement this feed forward neural network with two inputs.

$$ c_i = W_1 tanh(W_2m_i + W_3v_a + b_i$$

$m_i$ is a vector of embeddings of a single word 1 for each in the sentence and $v_a$ is the embedding vector of
The model parameters are: $$W_1 \in R^{1xd}, W_2 \in R^{dxd}, W_3 \in R^{dxd}, b_1 \in {}$$

The resulting ${c_1, c_2,..., c_N}$ after being passed through a softmax will represent the weight that is given to each word in the sentence

Hi, I guess your model will be like:

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable

class YourModel(nn.Module):
    def __init__(self):
        super(YourModel, self).__init__()
        self.lin1 = nn.Linear(5, 5)
        self.lin2 = nn.Linear(5, 5)

    def forward(self, m_i, v_a):
        y1 = self.lin1(m_i)
        y2 = self.lin2(v_a)
        y = F.tanh(y1+y2)
        y = F.softmax(y)
        return y


model = YourModel()
m_i = Variable(torch.Tensor([1,2,3,4,5]))
v_a = Variable(torch.Tensor([6,7,8,9,10]))
output = model.forward(m_i, v_a)

you can get output as follows:

Variable containing:
[torch.FloatTensor of size 5]

1 Like

Thanks Ken, thats a nice example, i got the intuition now!

One quick question, i think i should also pass the the y variable through a linear layer since there is W_1 multiplying it in the original function.

$$ c_i = W_1( tanh(W_2m_i + W_3v_a + b_i )$$

You think this is necesary or just applying the softmax straight away is ok?

def forward(self, m_i, v_a):
        y1 = self.lin1(m_i)
        y2 = self.lin2(v_a)
        y = F.tanh(y1+y2)
        y = self.lin3(y)
        y = F.softmax(y)
        return y

Yes, you need to add lin3 :slight_smile: