RuntimeError: arguments are located on different GPUs, Customize RNN

I have recently tried to implement my customized RNN.

Looking over the posts regarding customized RNN in PyTorch. Many people suggest a good example to start with:

BN-LSTM

The code can run smoothly when using only single GPU. When I started to use this code with multiple GPUs, I received an error with either LSTMCell or BNLSTMCell:

RuntimeError: arguments are located on different GPUs at /py/conda-bld/pytorch_1493677666423/work/torch/lib/THC/generic/THCTensorMathBlas.cu:232

Looking at the code, I can’t seem to figure where exactly is causing this problem. Could someone point me a direction to solve this problem?