Hello! I have a question concerning online learning with pyTorch. Usually, samples are supplied to networks as a list. In online learning, in contrast, training and prediction uses single samples at a time.
With feed forward networks you can simply set a batch size of one and supply the network with a data set consisting of only one sample. With a recurrent network, however, I need to keep the activation of the recurrent neurons for the next incoming sample.
The pseudocode below illustrates what I want to do with a time series iterator that consists of individual samples.
input_value = 0.
target_value = time_series.next()
output_value = network.train(input_value, target_value)
error = output_value - target_value
input_value = target_value
I got the impression pyTorch might be appropriate to achieve this with minimal invasion. In every example I found, however, the samples were provided in one big data set.
Can you do this reasonably easy with pyTorch? A code example would be highly appreciated. Thanks a lot!
Your pseudocode is already basically correct - here’s a simplified case:
input = some_input
hidden = model.init_hidden()
for i in range(seq_len):
input, hidden = model(input, hidden)
# Do something with final states...
Here’s a more thorough example modified from a tutorial
for i in range(target_length):
output, hidden = decoder(input, hidden)
loss += criterion(output, target[i])
# Create new input from max value
top_v, top_i = output.data.topk(1)
top_i = top_i
input = Variable(torch.LongTensor([[top_i]]))
In this case the input is an embedding and the output is from log_softmax, so to get the next input you have to create a new input from the maximum value.
hello sean! thanks for your answer! unfortunately, i am having a bit of trouble understanding it. first, i was expecting to read
unsqueeze somewhere because of the notice box here. second, although reading the tutorial you reference, i cannot see why the input is a maximum instead of simply the current value in the time series.
if it is possible, a minimal working example would help me out greatly. thanks a lot!
Here’s a working example that uses teacher forcing half of the time, and trains on its own outputs the other half: https://gist.github.com/spro/ef26915065225df65c1187562eca7ec4
You often see
unsqueeze because Linear layers expect
B x N tensors while RNNs expect
S x B x N (sequence, batch, size). In the referenced tutorial the inputs are in a different shape from the outputs (inputs are character indexes in a LongTensor, outputs are probabilities in a FloatTensor) so you have to do some manual work to convert it.
maybe we have a different understanding of online learning? i think about this.
Yeah I misunderstood what you were asking about. If you had some blocking
get_latest_sample function, this should work - mostly the same as offline training but using a input of size 1 (or some chosen chunk size). Importantly, keep the last input and hidden state around for future time steps (I also updated the above gist to pass
hidden as an argument to
last_targets = get_latest_sample()
hidden = None
inputs = last_targets
targets = get_latest_sample()
outputs, hidden = model(inputs, hidden)
loss = criterion(outputs, targets)
last_targets = targets
The above no longer works in current versions of PyTorch as far as I’m aware. It won’t allow you to keep the hidden state like that without detaching it from the graph, as it will complain about freeing up the graph or in-place modifications of the tensors. I’m looking for a way around that, which is why I’ve opened this thread here: How to backpropagate a loss through time-series RNN?. Any help with the matter would be much appreciated!