The input_size argument to any RNN says how many features will there be for each step in a sequence, not what it’s length is going to be. Keras uses static graphs, so it needs to know the length of the sequence upfront, PyTorch has dynamic autodifferentiation so it doesn’t care about the sequence length - you can use a different one every iteration.

See the GRU docs for more details on the arguments.

Thanks for your helping, like I wrote above the script works, “literally” but the loss doesn’t decrease over the epochs, so give me some advice. I think the related parts are,

class Net(nn.Module):
def __init__(self, features, cls_size):
super(Net, self).__init__()
self.rnn1 = nn.GRU(input_size=features,
hidden_size=hidden_size,
num_layers=1)
self.dense1 = nn.Linear(hidden_size, cls_size)
def forward(self, x, hidden):
x, hidden = self.rnn1(x, hidden)
x = x.select(0, maxlen-1).contiguous()
x = x.view(-1, hidden_size)
x = F.softmax(self.dense1(x))
return x, hidden
def init_hidden(self, batch_size=batch_size):
weight = next(self.parameters()).data
return Variable(weight.new(1, batch_size, hidden_size).zero_())
def var(x):
x = Variable(x)
if cuda:
return x.cuda()
else:
return x
model = Net(features=features, cls_size=len(chars))
if cuda:
model.cuda()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=lr)
def train():
model.train()
hidden = model.init_hidden()
for epoch in range(len(sentences) // batch_size):
X_batch = var(torch.FloatTensor(X[:, epoch*batch_size: (epoch+1)*batch_size, :]))
y_batch = var(torch.LongTensor(y[epoch*batch_size: (epoch+1)*batch_size]))
model.zero_grad()
output, hidden = model(X_batch, var(hidden.data))
loss = criterion(output, y_batch)
loss.backward()
optimizer.step()
for epoch in range(nb_epochs):
train()

the input is “one-hot” vector and I tried changing its learning rate but the result is the same.

Finally I found that I misused the loss function torch.nn.CrossEntropyLoss. I changed the loss function to nn.NLLLoss(log_softmax(output), target) then the loss decreases as expected.

Yup, that looks good! Note that you can now pass in hidden = None in the first iteration. The RNN will initialize a zero-filled hidden state for you. You might need to update pytorch though.

I have a question about the the number of parameters in RNN. I defined a RNN layer and get its paramters. I thought the number of parameters in a RNN layer should differ from different input lengths. However, when I use parameters() to get its parameters, the number of parameters seemed similar to that of the RNN layer with only one time steps.

Your model is going to be the same, whatever is the length of your input.
In Torch we used to clone the model as many times as the time steps while sharing the parameters, because it is the same model, just over time.
The number of parameters will change when your input dimensionality will change (the size of x[t], for a given t = 1, ..., T), and not when T changes.

If it is still not clear, you can go over my lectures on RNNs (ref.).
And if it is still confusing, wait for the PyTorch video tutorials I’m currently working on.

In RNN cell, why the documentation says the input is input (batch, input_size) , while in the example given in the documentation, the input is input = Variable(torch.randn(6, 3, 10)) ?