Pytorch Torchtext padding packing attention


(Alastair) #1

Hey Everybody,

I have a question in regards to NLP and specifically LSTM models with attention. I am currently attempting to implement an LSTM model with attention in order to create a recommendation system for the CiteULike dataset. I wish to be able to predict whether a user would be interested in a scientific article based on their behavior and the behavior of other users. I create two embeddings based on the user and one based on the title text of the scientific article (using pre-trained embeddings by GloVe). These are fed into an LSTM “hopefully” with attention. I am however quite new to all this and I believe that there could be potential mistakes especially when it comes to padding and packing. Does anyone know how to add packing and padding? I have diffidulties in figuring out how to add it, and what to use as output? should it be the last hidden state and how do I extract it and “unpack” the relevant information? Any help would be greatly appreciated. I have added to model below. Currently the model runs (training, validation, testing etc). however the performance is quite bad and speed is extremely slow (No GPU to run cuda).

class CFNN(nn.Module):
    def __init__(self, num_users, num_items, embedding_dim=embedding_dim, n_hidden=n_hidden, l1_hidden=l1_hidden):
        super(CFNN, self).__init__()
        self.user_emb = nn.Embedding(num_users, embedding_dim)
        self.item_emb = nn.Embedding(num_items, embedding_dim)
        
        #[45, 208]
        #self.lin1 = nn.Linear(n_hidden+embedding_dim, l1_hidden)
        self.lin1 = nn.Linear(embedding_dim+n_hidden, l1_hidden)
        self.lin2 = nn.Linear(l1_hidden, 1)
        self.drop0 = nn.Dropout(0.1)
        self.drop1 = nn.Dropout(0.1)
                
        # RNN decoding
        self.rnn = nn.LSTM(embedding_dim, n_hidden, bidirectional = True)
        self.rnnlin = nn.Linear(n_hidden*2, n_hidden)
        
        self.sigmoid = nn.Sigmoid()
        
         # ATTENTION
        self.linA1=nn.Linear(n_hidden*2, n_hidden)
        self.linA2=nn.Linear(n_hidden,1)
        
    def forward(self, u, v, hidden, seq_lengths):
        #print(v.shape)
        U = self.user_emb(u)
        V = self.item_emb(v)
        
        rnnout,(hn,cn) = self.rnn(V,hidden) 
        hn, cn = hidden
        
        itemx1 = self.linA1(rnnout)
        x2 = torch.tanh(itemx1)
        x3 = self.linA2(x2)
        
        alpha = F.softmax(x3,dim=0)
        
        score= (1/rnnout[0])*torch.sum(rnnout*alpha,dim=0)
      
        V_rnn = F.relu(self.rnnlin(score))
        x = torch.cat([U, V_rnn], dim=1)
       
        ## Classification
        x = self.drop0(x)
        x = F.relu(self.lin1(x))
        x = self.drop1(x)
        
        x = self.lin2(x)
        x = self.sigmoid(x)
        return x