RuntimeError: arguments are located on different GPUs, Customize RNN

chihyaoma · August 18, 2017, 10:25pm

I have recently tried to implement my customized RNN.

Looking over the posts regarding customized RNN in PyTorch. Many people suggest a good example to start with:

BN-LSTM

jihunchoi/recurrent-batch-normalization-pytorch/blob/master/bnlstm.py

"""Implementation of batch-normalized LSTM."""
import torch
from torch import nn
from torch.autograd import Variable
from torch.nn import functional, init


class SeparatedBatchNorm1d(nn.Module):

    """
    A batch normalization module which keeps its running mean
    and variance separately per timestep.
    """

    def __init__(self, num_features, max_length, eps=1e-5, momentum=0.1,
                 affine=True):
        """
        Most parts are copied from
        torch.nn.modules.batchnorm._BatchNorm.
        """

This file has been truncated. show original

The code can run smoothly when using only single GPU. When I started to use this code with multiple GPUs, I received an error with either LSTMCell or BNLSTMCell:

RuntimeError: arguments are located on different GPUs at /py/conda-bld/pytorch_1493677666423/work/torch/lib/THC/generic/THCTensorMathBlas.cu:232

Looking at the code, I can’t seem to figure where exactly is causing this problem. Could someone point me a direction to solve this problem?