CNN Memory leak problem

mjchen611 · September 8, 2017, 1:38pm

Hi,

Thank you for pytorch. Would you have a hint how to approach ever increasing memory use?
I use pytorch to training a network(CNN),with the increase of epoch,I notice that the (RAM, but not GPU) memory increases from one epoch to the next.

After the number of epoch reaches 1000,the RAM is full and it can’t continue to iterate to train the CNN network.

Can you give me some suggestions?

Thank you so much.

smth · September 9, 2017, 1:42am

you are likely keeping references to Variables somewhere. Very likely you are doing:

total_loss = total_loss + current_loss

instead of:

total_loss = total_loss + current_loss.data[0]

mjchen611 · September 9, 2017, 2:20am

@smth
Thank you so much for your reply.
But it seems did not work.
I use multi subprocesses to load data(num_workers =8) and with the increase of epoch,I notice that the (RAM, but not GPU) memory increases.
I thought may be I can kill subprocesses after a few of epochs and then reset new subprocesses to continue train the network,but I don’t know how to kill the subprocesses in the main processes.

Can you give me some suggestions?

Thank you so much.

mjchen611 · September 9, 2017, 2:39am

And I set num_workers = 0,the (RAM, but not GPU) memory not increases largely with the increase of epoch…

Can you give me some suggestions or instructions about the problem?

Thank you so much.

smth · September 9, 2017, 3:43pm

what are you loading here? images or some other format?
Someone recently reported when loading Tiff, the python library they are using to load had memory leaks.
Also, are you using a custom Dataset class or using one from torchvision / torchtext?

mjchen611 · September 10, 2017, 12:41am

Thank you for your reply.

Loading images (.jpg), as showing below

kwargs = {'num_workers': 4, 'pin_memory': True} if args.cuda else {}

train_loader = torch.utils.data.DataLoader(
        TripletImageLoader('../data',trainImage,
                           trainTriplet,
                           transform=transforms.Compose([
                           transforms.Scale(args.imageSize),
                           transforms.ToTensor(),
                           transforms.Normalize((,), (,))
                           ])),
batch_size=args.batch_size, shuffle=True, **kwargs)

smth · September 10, 2017, 4:38pm

this should definitely not go out of memory. Can you share the code for your data loader?

mjchen611 · September 11, 2017, 2:34am

Hi,thank you for your reply.
The code of data loader shown as follow:

def default_image_loader(path):
     return Image.open(path)

class TripletMNIST(torch.utils.data.Dataset):
   def __init__(self, base_path, filenames_filename, triplets_file_name, transform=None,
              loader=default_image_loader):

       self.base_path = base_path  
       self.filenamelist = []
       for line in open(filenames_filename):
           self.filenamelist.append(line.rstrip('\n'))
       triplets = []
       for line in open(triplets_file_name):
           triplets.append((line.split()[0], line.split()[1], line.split()[2])) # anchor, far, close
       self.triplets = triplets
       self.transform = transform
       self.loader = loader

  def __getitem__(self, index):        
       path1, path2, path3 = self.triplets[index]
       img1 = self.loader(os.path.join(self.base_path,self.filenamelist[int(path1)]))
       img2 = self.loader(os.path.join(self.base_path,self.filenamelist[int(path2)]))
       img3 = self.loader(os.path.join(self.base_path,self.filenamelist[int(path3)]))
       if self.transform is not None:
           img1 = self.transform(img1)
           img2 = self.transform(img2)
           img3 = self.transform(img3)

       return img1, img2, img3

  def __len__(self):
       return len(self.triplets)