Training lstm 10x faster

I understand how to use nn.LSTM correctly, and I do have access to GPUs. So it might seem weird that I need to train them faster.

I’m trying to apply recurrent layers to reinforcement learning. On standard RL environments, training takes 200k to 1 million gradient updates for non-recurrent agents. To do the same number of updates with a recurrent agent takes way too long, given a backprop through time of 200 to 1000 timesteps.

Are there ways to potentially speed up LSTM training time (even 2x or 3x would help tremendously), apart from (1) using nn.LSTM instead of nn.LSTMCell and (2) using GPU instead of CPU?

Thanks!

Do you mind sharing the model you’re using and how you are collecting observations from the environment?