I’ve trained a language model with AdaptiveSofmaxWithLoss
as the decoder.
Now, I’m having hard time generating text.
The ouput of AdaptiveSoftmaxWithLoss
is the computed target log probability. I’m not sure how to convert this to a predicted word index.
Also, I wanted use predict()
method which will return the predicted class. But I’m not sure how to use this since the input it takes should have hidden_dim
in the last dimension so that it can pass it to the head
cluster.
So, I loaded the model in generate.py
, initialized an input and a output tensor. And finally passed them layer by layer like following. This generates text but with repeating patterns.
with open(args.checkpoint, 'rb') as f:
model = torch.load(f).to(device)
model.eval()
corpus = data.Corpus(args.data)
ntokens = len(corpus.dictionary)
hidden = model.init_hidden(1)
input = torch.randint(ntokens, (1, 1), dtype=torch.long).to(device)
targets = torch.randint(ntokens, (1, 1), dtype=torch.long).view(-1).to(device)
with open(args.outf, 'w') as outf:
with torch.no_grad():
for i in range(args.words):
emb = model.encoder(input)
output, hidden = model.rnn(emb, hidden)
output = output.view(-1, output.size(2))
result = model.decoder.predict(output)
word_idx = result[0]
input.fill_(word_idx)
word = corpus.dictionary.idx2word[word_idx]
outf.write(word + ('\n' if i % 20 == 19 else ' '))
if i % args.log_interval == 0:
print('| Generated {}/{} words'.format(i, args.words))
Here is the model definition:
class AdaptiveSoftmaxRNN(nn.Module):
def __init__(self, ntoken, ninp, nhid, nlayers, dropout=0.5, cutoffs=[5000, 10000, 50000, 100000]):
super(AdaptiveSoftmaxRNN, self).__init__()
ntoken = ntoken
self.drop = nn.Dropout(dropout)
self.encoder = nn.Embedding(ntoken, ninp)
self.rnn = nn.LSTM(ninp, nhid, nlayers, dropout=dropout)
self.decoder = nn.AdaptiveLogSoftmaxWithLoss(nhid, ntoken, cutoffs=cutoffs, div_value=2.0)
self.init_weights()
self.nhid = nhid
self.nlayers = nlayers
def init_weights(self):
initrange = 0.1
self.encoder.weight.data.uniform_(-initrange, initrange)
def forward(self, input, hidden, targets):
emb = self.drop(self.encoder(input))
output, hidden = self.rnn(emb, hidden)
output = self.drop(output)
output = output.view(-1,output.size(2))
output, loss = self.decoder(output, targets)
return output, hidden, loss
def init_hidden(self, bsz):
weight = next(self.parameters())
return (weight.new_zeros(self.nlayers, bsz, self.nhid),
weight.new_zeros(self.nlayers, bsz, self.nhid))
Also, if I want to use predict()
or log_prob()
method of AdaptiveSoftmaxWithLoss
, what is the proper way? Should I use them in the forward method of my model definition?
And is this okay to load the model and access the layers and pass data to them since the layers are already trained?