Clarifying input size to RNN in word_language_model example?

mcskwayrd · February 6, 2017, 3:45am

Hi. I’m trying to understand something…

In the word_language_model example, the network is trained on “data” sequences which are args.bptt long, which are by default 20 words long (in batches which are also 20 by default):
output, hidden = model(data, hidden)

And then in the generate.py, you load the same model via the checkpoint file, but then the starting “input” is only one word long:
input = Variable(torch.rand(1, 1).mul(ntokens).long(), volatile=True)
and then you predict a new word via…
output, hidden = model(input, hidden)

How is this possible? If the model is expecting 20 inputs, shouldn’t it produce an error when you try to send it only 1?

Furthermore, when I try to actually send the generation code a sequence of length 20 by creating…

input = corpus.test[0:20]
print("input =",input)

Then I get…

('input = ',
142
78
54
251
2360
405
24
315
706
32
101
934
935
936
874
251
572
5564
2680
34
[torch.LongTensor of size 20]
)
Traceback (most recent call last):
File “generate.py”, line 85, in
output, hidden = model(input, hidden)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py”, line 210, in call
result = self.forward(*input, **kwargs)
File “/home/mcskwayrd/neural/torch/pytorch/examples/word_language_model/model.py”, line 27, in forward
emb = self.encoder(input)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py”, line 210, in call
result = self.forward(*input, **kwargs)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/modules/sparse.py”, line 94, in forward
)(input, self.weight)
RuntimeError: expected a Variable argument, but got LongTensor

And if instead I use the get_batch() method, as it was used in main.py…

corpus = data.Corpus(args.data)
ntokens = len(corpus.dictionary)
hidden = model.init_hidden(1)

def batchify(data, bsz):       # breaks into parallel streams
    nbatch = data.size(0) // bsz
    data = data.narrow(0, 0, nbatch * bsz)
    data = data.view(bsz, -1).t().contiguous()
    if args.cuda:
        data = data.cuda()
    return data

eval_batch_size = 10
test_data = batchify(corpus.test, eval_batch_size)
  
def get_batch(source, i, evaluation=False):
    bptt = 20
    seq_len = min(bptt, len(source) - 1 - i)
    data = Variable(source[i:i+seq_len], volatile=evaluation)
    target = Variable(source[i+1:i+1+seq_len].view(-1))
    return data, target

#input = Variable(torch.rand(1, 1).mul(ntokens).long(), volatile=True)
input, targets = get_batch(test_data, 0, evaluation=True)

Then I when I get to the prediction step (i.e., " output, hidden = model(input, hidden)" ), I get the error…

Traceback (most recent call last):
File “generate.py”, line 96, in
output, hidden = model(input, hidden)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py”, line 210, in call
result = self.forward(*input, **kwargs)
File “/home/mcskwayrd/neural/torch/pytorch/examples/word_language_model/model.py”, line 28, in forward
output, hidden = self.rnn(emb, hidden)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py”, line 210, in call
result = self.forward(*input, **kwargs)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/modules/rnn.py”, line 79, in forward
return func(input, self.all_weights, hx)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/_functions/rnn.py”, line 228, in forward
return func(input, *fargs, **fkwargs)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/_functions/rnn.py”, line 138, in forward
nexth, output = func(input, hidden, weight)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/_functions/rnn.py”, line 67, in forward
hy, output = inner(input, hidden[l], weight[l])
File “/usr/local/lib/python2.7/dist-packages/torch/nn/_functions/rnn.py”, line 96, in forward
hidden = inner(input[i], hidden, *weight)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/_functions/rnn.py”, line 22, in LSTMCell
gates = F.linear(input, w_ih, b_ih) + F.linear(hx, w_hh, b_hh)
File “/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py”, line 748, in add
return self.add(other)
File “/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py”, line 288, in add
return self._add(other, False)
File “/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py”, line 282, in _add
return Add(inplace)(self, other)
File “/usr/local/lib/python2.7/dist-packages/torch/autograd/_functions/basic_ops.py”, line 13, in forward
return a.add(b)
RuntimeError: inconsistent tensor size at /home/soumith/local/builder/wheel/pytorch-src/torch/lib/TH/generic/THTensorMath.c:601

Confused: How does sending 20-word sequences work in main.py but fail in generate.py?

PS- I see the documentation for torch.nn.RNN says input is supposed to be a Tensor, but that’s just what I’m sending. It didn’t say anything about needing a Variable or other “matrix”:

input (seq_len, batch, input_size): tensor containing the features of the input sequence.

Thanks!

apaszke · February 6, 2017, 12:13pm

Yes, the documentation is wrong - all these arguments should be torch.autograd.Variables.

The confusion comes from a dynamic vs static graph framework. PyTorch constructs the graphs every time, so it doesn’t care in advance what length of the sequence will you be using with the RNN. The only arguments that you have to pass in to the constructor of the RNN are how many features should the input have, and what’s the hidden layer size. Then, you can use sequences of different lengths at every iteration, and it should work just fine.

The only note that can lower the memory usage is to forward a fake batch before the training, that’s of the size of the longest sequence. This will allow our CUDA allocator to preallocate memory that can be reused for all (smaller) batches.

mcskwayrd · February 6, 2017, 7:54pm

Thanks for writing back Adam. So this is the “flexible input size” feature I’ve been hearing so much about. Great!

If I may ask a related question then: if I actually wanted to try to start the generator code using a sequence that is 20 timesteps long, using data from the “test” dataset as in the two attempts I listed above, how would you make it so that “model” would accept that input?

I tried converting “input” to a variable (in generate.py)…

eval_batch_size = 10
test_data = batchify(corpus.test, eval_batch_size)

def get_batch(source, i, evaluation=False):
    bptt = 20
    seq_len = min(bptt, len(source) - 1 - i)
    data = Variable(source[i:i+seq_len], volatile=evaluation)
    target = Variable(source[i+1:i+1+seq_len].view(-1))
    return data, target

input, target = get_batch(test_data, 0, evaluation=True)
#input = Variable(torch.rand(1, 1).mul(ntokens).long(), volatile=True)

input = Variable(input, volatile=True)

…but when I do that, I get the error that it’s already a Variable (presumably because of the cast in batchify):

Traceback (most recent call last):
File “generate.py”, line 83, in
input = Variable(input, volatile=True)
RuntimeError: Variable data has to be a tensor, but got Variable

But if it’s already a Variable, then I don’t understand why I can’t use it as an input to “model” further below. (?)

Because if I don’t include that extra “Variable” re-casting, then still I get “inconsistent tensor size”…

Traceback (most recent call last):
File “generate.py”, line 90, in
output, hidden = model(input, hidden)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py”, line 210, in call
result = self.forward(*input, **kwargs)
File “/home/mcskwayrd/neural/torch/pytorch/examples/word_language_model/model.py”, line 28, in forward
output, hidden = self.rnn(emb, hidden)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py”, line 210, in call
result = self.forward(*input, **kwargs)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/modules/rnn.py”, line 79, in forward
return func(input, self.all_weights, hx)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/_functions/rnn.py”, line 228, in forward
return func(input, *fargs, **fkwargs)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/_functions/rnn.py”, line 138, in forward
nexth, output = func(input, hidden, weight)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/_functions/rnn.py”, line 67, in forward
hy, output = inner(input, hidden[l], weight[l])
File “/usr/local/lib/python2.7/dist-packages/torch/nn/_functions/rnn.py”, line 96, in forward
hidden = inner(input[i], hidden, *weight)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/_functions/rnn.py”, line 22, in LSTMCell
gates = F.linear(input, w_ih, b_ih) + F.linear(hx, w_hh, b_hh)
File “/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py”, line 748, in add
return self.add(other)
File “/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py”, line 288, in add
return self._add(other, False)
File “/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py”, line 282, in _add
return Add(inplace)(self, other)
File “/usr/local/lib/python2.7/dist-packages/torch/autograd/_functions/basic_ops.py”, line 13, in forward
return a.add(b)
RuntimeError: inconsistent tensor size at /home/soumith/local/builder/wheel/pytorch-src/torch/lib/TH/generic/THTensorMath.c:601

What is the source of this inconsistency, if the input length isn’t supposed to matter?

apaszke · February 7, 2017, 1:23am

You shouldn’t rewrap the input into Variable again. get_batch already does it for you. It’s weird that you’re getting that error though. It seems that there’s some problem with the network definition. Are you using a model trained with main.py earlier?

mcskwayrd · February 7, 2017, 5:01am

Ok, removed the rewrap.

Yes, I’ve run main.py which finishes and saves a model.pt file, then I immediately run generate.py.
The only difference is, I replaced one line in generate.py (“input = Variable”) with code borrowed from main.py:

corpus = data.Corpus(args.data)
ntokens = len(corpus.dictionary)
hidden = model.init_hidden(1)
# input = Variable(torch.rand(1, 1).mul(ntokens).long(), volatile=True)
def batchify(data, bsz):       # breaks into parallel streams
    nbatch = data.size(0) // bsz
    data = data.narrow(0, 0, nbatch * bsz)
    data = data.view(bsz, -1).t().contiguous()
    if args.cuda:
        data = data.cuda()
    return data
eval_batch_size = 10
test_data = batchify(corpus.test, eval_batch_size)
def get_batch(source, i, evaluation=False):
    bptt = 20
    seq_len = min(bptt, len(source) - 1 - i)
    data = Variable(source[i:i+seq_len], volatile=evaluation)
    target = Variable(source[i+1:i+1+seq_len].view(-1))
    return data, target
input, target = get_batch(test_data, 0, evaluation=True)

…and the rest of generate.py is unchanged from your original.

Running this version of the code produces an error about hidden size…

Traceback (most recent call last):
File “generate.py”, line 93, in
output, hidden = model(input, hidden)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py”, line 210, in call
result = self.forward(*input, **kwargs)
File “/home/mcskwayrd/neural/torch/pytorch/examples/word_language_model/model.py”, line 28, in forward
output, hidden = self.rnn(emb, hidden)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py”, line 210, in call
result = self.forward(*input, **kwargs)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/modules/rnn.py”, line 79, in forward
return func(input, self.all_weights, hx)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/_functions/rnn.py”, line 228, in forward
return func(input, *fargs, **fkwargs)
File “/usr/local/lib/python2.7/dist-packages/torch/autograd/function.py”, line 202, in _do_forward
flat_output = super(NestedIOFunction, self)._do_forward(*flat_input)
File “/usr/local/lib/python2.7/dist-packages/torch/autograd/function.py”, line 218, in forward
result = self.forward_extended(*nested_tensors)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/_functions/rnn.py”, line 180, in forward_extended
cudnn.rnn.forward(self, input, hx, weight, output, hy)
File “/usr/local/lib/python2.7/dist-packages/torch/backends/cudnn/rnn.py”, line 244, in forward
hidden_size, tuple(hx.size())))
RuntimeError: Expected hidden size (2, 10L, 200), got (2L, 1L, 200L)

It seems that it wants the hidden size to somehow follow the batch size, only with no "L"s…

apaszke · February 7, 2017, 6:04pm

Ah I see the problem. You’ve increased the input size, but you haven’t changed the hidden = ... part, so the hidden state is too small. These are not only the Ls that make the difference - second dimension was expected to be 10, but is 1.

mcskwayrd · February 7, 2017, 11:09pm

Right, ok. I needed to set eval_batch_size=1. and I can keep hidden=model.init_hidden(1). That makes the dimensions agree.

The only issue is that “output” then ends up being [20x1x10000] instead of [1x1x10000] like the remainder of the code expects. So I grab only the last element of output via

output = output[-1]

The following, then, is some working code for generate.py that feeds it an initial sequence of length 20! Thanks for your help!

###############################################################################
# Language Modeling on Penn Tree Bank
#
# This file generates new sentences sampled from the language model
#
###############################################################################

import argparse
import time
import math

import torch
import torch.nn as nn
from torch.autograd import Variable

import data

parser = argparse.ArgumentParser(description='PyTorch PTB Language Model')

# Model parameters.
parser.add_argument('--data', type=str, default='./data/penn',
                    help='location of the data corpus')
parser.add_argument('--checkpoint', type=str, default='./model.pt',
                    help='model checkpoint to use')
parser.add_argument('--outf', type=str, default='generated.txt',
                    help='output file for generated text')
parser.add_argument('--words', type=int, default='1000',
                    help='number of words to generate')
parser.add_argument('--seed', type=int, default=1111,
                    help='random seed')
parser.add_argument('--cuda', action='store_true',
                    help='use CUDA')
parser.add_argument('--temperature', type=float, default=1.0,
                    help='temperature - higher will increase diversity')
parser.add_argument('--log-interval', type=int, default=100,
                    help='reporting interval')
args = parser.parse_args()

# Set the random seed manually for reproducibility.
torch.manual_seed(args.seed)
if torch.cuda.is_available():
    if not args.cuda:
        print("WARNING: You have a CUDA device, so you should probably run with --cuda")
    else:
        torch.cuda.manual_seed(args.seed)

if args.temperature < 1e-3:
    parser.error("--temperature has to be greater or equal 1e-3")

with open(args.checkpoint, 'rb') as f:
    model = torch.load(f)

if args.cuda:
    model.cuda()
else:
    model.cpu()


def batchify(data, bsz):       # breaks into parallel streams
    nbatch = data.size(0) // bsz
    data = data.narrow(0, 0, nbatch * bsz)
    data = data.view(bsz, -1).t().contiguous()
    if args.cuda:
        data = data.cuda()
    return data

corpus = data.Corpus(args.data)
ntokens = len(corpus.dictionary)
def batchify(data, bsz):       # breaks into parallel streams
    nbatch = data.size(0) // bsz
    data = data.narrow(0, 0, nbatch * bsz)
    data = data.view(bsz, -1).t().contiguous()
    if args.cuda:
        data = data.cuda()
    return data
eval_batch_size = 1
test_data = batchify(corpus.test, eval_batch_size)
hidden = model.init_hidden(1)
def get_batch(source, i, evaluation=False):
    bptt = 20
    seq_len = min(bptt, len(source) - 1 - i)
    data = Variable(source[i:i+seq_len], volatile=evaluation)
    target = Variable(source[i+1:i+1+seq_len].view(-1))
    return data, target

input, target = get_batch(test_data, 0, evaluation=True)
#input = Variable(torch.rand(1, 1).mul(ntokens).long(), volatile=True)
#print("input = ",input)

if args.cuda:
    input.data = input.data.cuda()

with open(args.outf, 'w') as outf:
    for i in range(args.words):
        output, hidden = model(input, hidden)
        output = output[-1]
#        print("output = ",output)
        word_weights = output.squeeze().data.div(args.temperature).exp().cpu()
        word_idx = torch.multinomial(word_weights, 1)[0]
        input.data.fill_(word_idx)
        word = corpus.dictionary.idx2word[word_idx]

        outf.write(word + ('\n' if i % 20 == 19 else ' '))

        if i % args.log_interval == 0:
            print('| Generated {}/{} words'.format(i, args.words))

print(" ")

apaszke · February 8, 2017, 12:20am

Actually that might not be what you want. You want to pass the large input only once, to initialize the network, and then do the steps one by one. In this example you’ll forward a sequence of 20 words from the data, and then you’ll be feeding each output for 20 steps, and taking the last one as the next input (that will be applied 20 times). You should forward the batch through the network only once and slice off the last hidden state. Then, use that slice with an input of length 1 to generate the data.

mcskwayrd · February 8, 2017, 3:20am

Oh, I see. Tensor.data.fill_() just repeats that last value over & over throughout the tensor.
So in my code,
input.data.fill_(word_idx)
was just taking that one value and repeating it.

Got it. I need to re-size input after the first iteration of the loop. I’ll work on that… Thanks.