I have a generic PYTORCH CNN question. I am using a small network which consists of two CONV2D layers, two RELUs, two average pools (MazPool2D(kernelsize=2), and 2 FC layers. A summary of my network is below.
self.unit_first_layer = Unit(in_channels=1, out_channels=64)
self.unit_second_layer = Unit(in_channels=64, out_channels=128)
self.net3 = nn.Sequential(self.unit_first_layer, self.relu1, self.pool1, self.unit_second_layer, self.relu4, self.pool1)
def forward(self, input):
output = self.net3(input)
output = self.drop_out(output)
output = output.view(-1, 64 * 64 * 128)
output = self.fc(output)
Admittedly, I am training with a large amount of data. Around 8 gigs worth, but with a batch size of only 10. During training my memory utilization keeps growing. Around the third EPOCH, RAM is filled and the system crashes.
for i, (images, labels) in enumerate(data_loader_train):
File “/home/ubuntu/anaconda3/envs/pytorch_p27/lib/python2.7/site-packages/torch/utils/data/dataloader.py”, line 819, in iter
File “/home/ubuntu/anaconda3/envs/pytorch_p27/lib/python2.7/site-packages/torch/utils/data/dataloader.py”, line 560, in init
File “/home/ubuntu/anaconda3/envs/pytorch_p27/lib/python2.7/multiprocessing/process.py”, line 130, in start
self._popen = Popen(self)
File “/home/ubuntu/anaconda3/envs/pytorch_p27/lib/python2.7/multiprocessing/forking.py”, line 121, in init
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
My question is, is this memory growth a normal part of a CNN? I know it has to create many feature maps. Is this a normal part of the process or is this likely a programming error? I am using a training loop that is very similar to the one on many PYTORCH tutorials.
I have a lot of ram (64 gigs). I just do not know if this is my fault, or this is simply a part of training CNNs (you need huge amounts of RAM). I am training on a CPU, no GPU.