Using pre-trained sentence embeddings in PyTorch

sritvik · October 1, 2017, 10:00am

I am new to PyTorch, and trying to perform a sentence classification task in it.

I have averaged out the word embeddings in each sentence (glove embeddings) to form the sentence embedding. Hence, the dimension of each sentence embedding is the same.

As per my understanding, since I already have the embeddings I don’t need the embedding layer before using the LSTM.

My model is as follows:

import torch
import torch.autograd as autograd
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

class LSTM1(nn.Module):

def __init__(self,args):
    super(LSTM1,self).__init__()
    self.args = args
    self.hidden_dim = args.hidden_dim
    self.lstm = nn.LSTM(args.embed_dim, args.hidden_dim)
    self.hidden2tag = nn.Linear(args.hidden_dim, 2)
    self.hidden = self.init_hidden()

def init_hidden(self):
    return (autograd.Variable(torch.zeros(1,1, self.hidden_dim).cuda()),
        autograd.Variable(torch.zeros(1,1, self.hidden_dim).cuda()))

def forward(self,embeds):
    embeds = autograd.Variable(torch.from_numpy(embeds[0]).float().cuda())
    lstm_output, self.hidden = self.lstm(embeds.view(1, 1, -1), self.hidden)
    tag_space = self.hidden2tag(lstm_output.view(1, -1))
    scores = F.log_softmax(tag_space)
    return scores

And I pass the sentences in the form of embeddings as follows:

model = model.LSTM1(args).cuda()
criterion = nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr=1e-5, momentum=0.9, weight_decay=1e-5)

optimizer.zero_grad()

for epoch in range(20):
    for i in range(len(sentences)):
        optimizer.zero_grad()
        model.hidden = model.init_hidden()
        target = prepare_targets(tag_phrase[i],tag_to_ix,1) #Gets a Variable(long tensor) for the target, single value (either 0 or 1)
        score = model(sentences[i]) #sentences[i] is the embedding of sentence i

        loss = criterion(score,target)
        loss.backward()
        optimizer.step()

My doubts:

So the embedding goes into the model, where in the forward function it gets converted to a Variablle(float tensor), and hence an appropriate input for the LSTM. This is my understanding of things. Is this correct?
I am backpropagating after every sentence, so does that make each sentence a separate batch? How do I split the sentences into batches and what changes do I need to make in the model then?
What is the most appropriate way to use pre-trained sentence embeddings in PyTorch?
If this code seems correct, then the problem I’m facing is that everything is getting classified into a single class out of the two. Any suggestions to resolve this error?

Thank you.