Hi
I tried training LSTM
network with multiple processes (but single GPU
) using torch.multiprocessing
but I can’t get rid of this warning:
UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters().
Below is a simple code that can reproduce it. I tried using flatten_parameters
after model creation (before sharing it by multiple processes, (1)
in code). This makes no difference at all. Another place where I tried using it was (2)
when each process gets its own copy but this makes memory not shared anymore.
Without flattening parameters networks are using whole lot more memory on GPU
(around two times more). Is there a way to solve this issue and flatten_parameters
?
I’m using Manjaro OS
.
import time
import torch
from torch import nn
import torch.multiprocessing as mp
num_proc = 3
units = 1024
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.lstm = nn.LSTM(units, units, num_layers=8)
def forward(self, x):
return self.lstm(x)
def train(pid, model):
time.sleep(pid + 1)
x = torch.randn(1, 1, units).cuda()
# (2) calling flatten_parameters here makes warning
# go away but it also makes memory not shared anymore
# model.lstm.flatten_parameters()
while True:
model(x)
with torch.no_grad():
for w in model.parameters():
w += 1.
print(pid, list(model.parameters())[0][0, 0].item(), flush=True)
time.sleep(num_proc)
def main():
mp.set_start_method('spawn')
pool = mp.Pool(num_proc + 1)
model = Net().cuda()
# (1) this line doesn't make any difference
# no matter if after sharing memory or before
# model.lstm.flatten_parameters()
model.share_memory()
processes = []
for pid in range(num_proc):
processes.append(pool.apply_async(train, args=(pid, model)))
try:
for p in processes:
p.get(timeout=1000000.)
except KeyboardInterrupt:
print('Terminating pool...')
pool.terminate()
if __name__ == '__main__':
main()