Generating text in case of AdaptiveSoftmax

I’ve trained a language model with AdaptiveSofmaxWithLoss as the decoder.
Now, I’m having hard time generating text.
The ouput of AdaptiveSoftmaxWithLoss is the computed target log probability. I’m not sure how to convert this to a predicted word index.

Also, I wanted use predict() method which will return the predicted class. But I’m not sure how to use this since the input it takes should have hidden_dim in the last dimension so that it can pass it to the head cluster.

So, I loaded the model in generate.py, initialized an input and a output tensor. And finally passed them layer by layer like following. This generates text but with repeating patterns.

with open(args.checkpoint, 'rb') as f:
    model = torch.load(f).to(device)
model.eval()

corpus = data.Corpus(args.data)
ntokens = len(corpus.dictionary)

hidden = model.init_hidden(1)
input = torch.randint(ntokens, (1, 1), dtype=torch.long).to(device)
targets = torch.randint(ntokens, (1, 1), dtype=torch.long).view(-1).to(device)

with open(args.outf, 'w') as outf:
    with torch.no_grad(): 
        for i in range(args.words):
            emb = model.encoder(input) 
            output, hidden = model.rnn(emb, hidden) 
            output = output.view(-1, output.size(2)) 
            result = model.decoder.predict(output) 
            word_idx = result[0] 

            input.fill_(word_idx)
            word = corpus.dictionary.idx2word[word_idx]

            outf.write(word + ('\n' if i % 20 == 19 else ' '))

            if i % args.log_interval == 0:
                print('| Generated {}/{} words'.format(i, args.words))


Here is the model definition:

class AdaptiveSoftmaxRNN(nn.Module):
    def __init__(self, ntoken, ninp, nhid, nlayers, dropout=0.5, cutoffs=[5000, 10000, 50000, 100000]):
        super(AdaptiveSoftmaxRNN, self).__init__()
        ntoken = ntoken
        self.drop = nn.Dropout(dropout)
        self.encoder = nn.Embedding(ntoken, ninp)
        self.rnn = nn.LSTM(ninp, nhid, nlayers, dropout=dropout)
        self.decoder = nn.AdaptiveLogSoftmaxWithLoss(nhid, ntoken, cutoffs=cutoffs, div_value=2.0)
        self.init_weights()
        self.nhid = nhid
        self.nlayers = nlayers

    def init_weights(self):
        initrange = 0.1
        self.encoder.weight.data.uniform_(-initrange, initrange)

    def forward(self, input, hidden, targets):
        emb = self.drop(self.encoder(input)) 
        output, hidden = self.rnn(emb, hidden) 
        output = self.drop(output) 
        output = output.view(-1,output.size(2)) 

        output, loss = self.decoder(output, targets)
        return output, hidden, loss

    def init_hidden(self, bsz):
        weight = next(self.parameters())
        return (weight.new_zeros(self.nlayers, bsz, self.nhid),
                weight.new_zeros(self.nlayers, bsz, self.nhid))

Also, if I want to use predict() or log_prob() method of AdaptiveSoftmaxWithLoss, what is the proper way? Should I use them in the forward method of my model definition?
And is this okay to load the model and access the layers and pass data to them since the layers are already trained?