Part of LSTM deployed on the wrong GPU

Hi, I have a machine with two GPUs (GPU 0 and GPU 1) and am trying to run an nn.LSTM on GPU 1, but observe that the memory consumption of both GPUs rise after deploying the LSTM on GPU 1.

Here is a minimal code that recreates this issue.

from torch.nn import LSTM
lstm = LSTM
lstm.cuda(1)

After executing the last line, the memory usage of GPU 0 jumps from 1059 MB to 1304 MB, and the memory usage of GPU 1 jumps from 2 MB to 339 MB. I have attempted to trace the issue that is causing this problem and it seems like the exit method in the device class in torch/cuda/init.py is assigning some FloatTensors to GPU 0. To be more specific, I am referring to this block of code.

def __exit__(self, *args):
    if self.prev_idx != self.idx:
        torch._C._cuda_setDevice(self.prev_idx)
    return False

After some debugging, I saw that self.prev_idx = 0, is there a way to make the LSTM completely deploy on GPU1?

this was a bug, i think it’s fixed on master via https://github.com/pytorch/pytorch/pull/2179

It will be part of the next release, or you can install master version from source (instructions here: https://github.com/pytorch/pytorch#from-source )

1 Like