Hi, I have a machine with two GPUs (GPU 0 and GPU 1) and am trying to run an nn.LSTM on GPU 1, but observe that the memory consumption of both GPUs rise after deploying the LSTM on GPU 1.
Here is a minimal code that recreates this issue.
from torch.nn import LSTM
lstm = LSTM
lstm.cuda(1)
After executing the last line, the memory usage of GPU 0 jumps from 1059 MB to 1304 MB, and the memory usage of GPU 1 jumps from 2 MB to 339 MB. I have attempted to trace the issue that is causing this problem and it seems like the exit method in the device class in torch/cuda/init.py is assigning some FloatTensors to GPU 0. To be more specific, I am referring to this block of code.
def __exit__(self, *args):
if self.prev_idx != self.idx:
torch._C._cuda_setDevice(self.prev_idx)
return False
After some debugging, I saw that self.prev_idx = 0, is there a way to make the LSTM completely deploy on GPU1?