I have been trying to use the python subprocess module while training a neural network in pytorch, but I noticed that subprocess runs many times slower if I have a network initialized on a gpu. Here is an example script I used, with a very simple linear network, profiling the times using line_profiler, and looping a simple subprocess call 100 times:
import torch
import torch.nn as nn
import subprocess
from line_profiler import LineProfiler
class TestNN(nn.Module):
def __init__(self, device):
super(TestNN, self).__init__()
self.fc1 = nn.Linear(5,16)
self.device=device
self.to(self.device)
def test_subprocess():
device = torch.device('cuda:0')
testNet=TestNN(device)
for i in range(100):
subprocess.run(["ls", "-l"], capture_output=True)
if __name__ == '__main__':
lprofiler =LineProfiler()
lp_wrapper = lprofiler(test_subprocess)
lp_wrapper()
lprofiler.print_stats()
Just by moving the small network to gpu results in more than 4x slower execution of subprocess.run().
My results from line_profiler when the network is on cpu:
Total time: 1.46088 s
File: test_subprocess_gpu.py
Function: test_subprocess at line 13
Line # Hits Time Per Hit % Time Line Contents
==============================================================
13 def test_subprocess():
14 1 172.0 172.0 0.0 device = torch.device('cpu')
15 1 806.0 806.0 0.1 testNet=TestNN(device)
16
17 101 1235.0 12.2 0.1 for i in range(100):
18 100 1458671.0 14586.7 99.8 subprocess.run(["ls", "-l"], capture_output=True)
My results when the network is initialized on GPU:
Timer unit: 1e-06 s
Total time: 8.63406 s
File: test_subprocess_gpu.py
Function: test_subprocess at line 13
Line # Hits Time Per Hit % Time Line Contents
==============================================================
13 def test_subprocess():
14 1 174.0 174.0 0.0 device = torch.device('cuda:0')
15 1 2084937.0 2084937.0 24.1 testNet=TestNN(device)
16
17 101 1163.0 11.5 0.0 for i in range(100):
18 100 6547789.0 65477.9 75.8 subprocess.run(["ls", "-l"], capture_output=True)
Does anyone know what causes this slow-down and how to increase the speed with the network initialized on GPU? I am very puzzled why initializing a neural network on GPU will have any affect on the speed of subprocess.run(). Any help is greatly appreciated!