I understand how to use
nn.LSTM correctly, and I do have access to GPUs. So it might seem weird that I need to train them faster.
I’m trying to apply recurrent layers to reinforcement learning. On standard RL environments, training takes 200k to 1 million gradient updates for non-recurrent agents. To do the same number of updates with a recurrent agent takes way too long, given a backprop through time of 200 to 1000 timesteps.
Are there ways to potentially speed up LSTM training time (even 2x or 3x would help tremendously), apart from (1) using
nn.LSTM instead of
nn.LSTMCell and (2) using GPU instead of CPU?