One-to-one LSTM input format unclear, undocumented `threshold()` argument error

Hello everyone, I’m new to PyTorch and am having some issues with a simple LSTM.

What I am trying to do is create an LSTM that will accept a 2-element tensor during inference, update its internal state, and return a single number (or a 1-element tensor). During inference, I will never be providing sequences, only data one element at a time. I assume this is a fairly common use case.

However, I am using sequences for training. I am providing an x tensor of the format (20, 1, 2), where 20 is the sequence length, 1 is the batch size, and 2 is the 2-element input tensor. y is similarly (20, 1, 1) because the output is 1 single element. What I expect to happen is that it will train one element at a time.

Is this a correct assumption? The other possibility is that you train with an entire sequence as input and the single next element as output, but that does not really work since the inputs and outputs are completely different data types. My use case is not like language translation, in which many inputs are translated into many different outputs, nor is it like predicting the next single letter in a sentence. Rather, I am using sequences of line equations (two elements of the form y=mx+b) as x, and using corresponding sequences of steering angles (for a self-driving car) as y. I may be doing this in the wrong way, please correct me as I am new to PyTorch and LSTMs (but not to machine learning).

I have attempted to fairly simply implement the above idea as follows, in Python 2. (The data loading code is obviously different in my real code, but the essence is the same).

import torch

from torch import autograd, nn, optim

# Create random training data
x = autograd.Variable(torch.randn(20, 1, 2))
y = autograd.Variable(torch.randn(20, 1, 1))

# Create a model and use the mean squared error loss function with the Adadelta optimizer
model = nn.Sequential(
    nn.LSTM(input_size=2, hidden_size=10, num_layers=1),
    nn.Linear(10, 4),
    nn.Linear(4, 1)
loss_function = nn.MSELoss()
optimizer = optim.Adadelta(model.parameters())

# Train the network one epoch at a time
for epoch in range(100):
    # Compute the predictions by passing the entire training sequence to the network
    predictions = model(x)
    # Compute and print the loss using the predictions
    loss = loss_function(predictions, y)
    # Zero the gradients for the variables that will be updated
    # Run backpropagation, calculating gradients for each of the trainable parameters
    # Update the parameters using the optimizer

In any case, when I attempt to run the above code, I get a completely undocumented error message:

Traceback (most recent call last):
  File "", line 24, in <module>
    predictions = model(x)
  File "/usr/local/lib/python2.7/site-packages/torch/nn/modules/", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/torch/nn/modules/", line 67, in forward
    input = module(input)
  File "/usr/local/lib/python2.7/site-packages/torch/nn/modules/", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/torch/nn/modules/", line 43, in forward
    return F.threshold(input, self.threshold, self.value, self.inplace)
RuntimeError: threshold(): argument 'input' (position 1) must be Variable, not tuple

Hopefully I am doing something stupid that can be easily corrected. Thank you in advance for your help. I would be glad to provide more information or clarify if necessary!

Dont put your LSTM in Sequential, it wont work.

See the documentation of LSTM for why it wont work: the input to LSTM has to be a tuple of Variables, but you are only passing a single Variable.
The output of LSTM is a tuple of Variables, but the input to ReLU is a Variable (not a tuple of Variables)

1 Like

Thank you! This fixed my problem, and the network is now training successfully. I am actually still using Sequential, but created a TakeFirst module that returns the first element of whatever is passed to it, as guided by this answer: Sequential LSTM II

One thing that is still unclear to me: how do you clear the LSTM’s memory? As I understand, loss.backward() does this if you do not set retain_state to True. How do you manually clear the LSTM’s memory, during inference? Moreover, will the state be retained across inference runs?

during inference, you can use volatile=True, see as a reference

1 Like