Ram usage increases linearly

Nilesh_Pandey1 · August 18, 2019, 3:19am

Hi, the below code increases the memory usage linearly, and at certain point I am not able to train the model. Surprisingly it is the first time I am facing problem with the following code?

doubts:

Vector images, Vector image is the only new data that is involved in the following code, commenting line which loads vector images makes the code run normally.

I really have no idea, any hint or suggestion would be highly appreciated.
Thank you,
Nilesh Pandey

class ImgAugTransform:
    def __init__(self):
        sometimes = lambda aug: iaa.Sometimes(0.5, aug)

        self.aug = iaa.Sequential([
        iaa.Affine(
            translate_percent={"x":0.2, "y": 0.1},
            rotate=40,
            mode='symmetric'
        )
    ])
    def __call__(self, img, img1, img2,img3):
        img = np.array(img)
        img1 = np.array(img1)
        img2 = np.array(img2)
        img3 = np.array(img3)

        return self.aug.augment_image(img), self.aug.augment_image(img1), self.aug.augment_image(img2),self.aug.augment_image(img3)

class Dataset(data.Dataset):
    """Dataset for XXX.
    """
    def __init__(self,height):
        super(Dataset, self).__init__()
        # base setting

        
        self.files = []
        self.vector = []
        with open("/home/XXX/train.txt", 'r') as f:
            for line in f.readlines():
                im_name, v_name = line.strip().split()
                self.files.append(im_name)
                self.vector.append(v_name)
                
        
        self.masks = os.listdir("/home/XXX/")
        
        self.range = np.arange(len(self.masks))
        self.rotate = ImgAugTransform()
    def name(self):
        return "XXX"
    
    
    def transformData(self, src, mask, target,ref_lr):

        if random.random() > 0.5:
            src, mask, target,ref_lr = self.rotate(src,mask, target,ref_lr)
           
        # Transform to tensor

        src = TF.to_tensor(src)
        mask = TF.to_tensor(mask)
        target = TF.to_tensor(target)
        ref_lr= TF.to_tensor(ref_lr)
        
        src = TF.normalize(src,(0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
        mask = TF.normalize(mask, (0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
        target = TF.normalize(target,(0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
        ref_lr = TF.normalize(ref_lr,(0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
        return src, mask, target,ref_lr

    
    def __getitem__(self, index):
        file = self.files[index]
        mask = self.masks[random.choice(self.range)]
        vector = self.vector[index]
        # person image 
        targ = rescale_intensity(plt.imread(osp.join('/homeXXX/', file))/255)#targ = rescale_intensity(plt.imread(osp.join('/homeXXX', file))/255)
        vec = rescale_intensity(plt.imread(osp.join('/home/XXX', vector))/255)
        mask = rescale_intensity(plt.imread(osp.join('/home/XXX', 'maskA', mask))/255)

        targ = resize(targ,(256,256))
        vec = resize(vec,(256,256))
        mask = resize(mask,(256,256))
        
        ms2 = mask*1
        ms2 = np.expand_dims(ms2,axis=2)
        ms2 = np.repeat(ms2,repeats=3,axis=2)
        
        src =targ*(1-ms2)+ms2

  
        src = Image.fromarray(np.uint8(src*255))
        mask = Image.fromarray(np.uint8(ms2*255))
        target = Image.fromarray(np.uint8(targ*255))
        vec = Image.fromarray(np.uint8(vec*255))
        source,mask,target,ref = self.transformData(src, mask, target, vec)
        return source,mask,target,ref

nile649 · August 18, 2019, 9:40pm

I am trying to run all my previous codes and project, and basically all are causing increase in RAM consumption linearly.
The change in hardware setup is low storage space, currently I am on a storage space less than <40GB, as far as I have known it doesn’t make sense, but anyone think it could be related?

ptrblck · August 18, 2019, 10:16pm

If I understand your issue correctly, commenting this line:

vec = rescale_intensity(plt.imread(osp.join('/home/XXX', vector))/255)

yields normal behavior, while keeping it increases the memory?
How did you define rescale_indensity?

@nile649
Your projects were running fine and now after changing the storage you are noticing an increasing memory usage? Did you change something else (PyTorch version etc.)?

nile649 · August 19, 2019, 1:40am

actually, I just cross checked with previous projects, and it is happening with them as well. I doubt it is the code, but something else. I did look in previous post about assigning different variables, not using list, and as such, but it seems it is a different problem.

All the research projects over the last year use the same dataloader, and it is the first time I am having memory issue even with the weak old projects.

ptrblck · August 19, 2019, 9:50am

Check, if you are accidentally storing the computation graph, e.g. by accumulating the loss without detaching or storing it in a list.
Could you try to create a minimal code snippet to reproduce this issue?

Diego · August 19, 2019, 12:43pm

Not sure if related, but I am having the same Issue with the mask-rcnn Pytorch version

nile649 · August 19, 2019, 5:18pm

You’re right, I did debugging step by step and the following line is the culprit, I am directly treating loss term as scalar while calculating average loss over the epoch.

avg_loss_g = (avg_loss_g+loss_G)/(i+1)

nile649 · August 19, 2019, 5:19pm

If the problem is not with variables in dataloader, then it is with the variables in main function.

Diego · August 19, 2019, 7:07pm

We are talking about CPU Memory, right?

Nilesh_Pandey1 · August 19, 2019, 7:21pm

Yes, CPU memory.

github.com/pytorch/pytorch

Possible CPU-side memory leak even when fitting on GPU

opened 05:07PM - 12 Nov 18 UTC

closed 05:13PM - 13 Nov 18 UTC

trias702

todo

## 🐛 Bug A possible CPU-side memory leak even when fitting on the GPU using P…yTorch 0.4.1. ## To Reproduce I am quite new to PyTorch, having used TF/Keras extensively in the past, but am now trying to use PyTorch as a replacement. I decided to start small with a seq2seq Skip-Thought model, cobbled together using the PyTorch NLP tutorials. Everything seems to work fine when I run small scale tests, however, when I use the code to run a large scale fit on 3000 separate paragraphs (each paragraph having a variable number of sentences) I notice that my system RAM usage slowly goes up as the script runs, until eventually it hits 100% and the box becomes unresponsive and has to be force rebooted. The Linux box has 64 GB of RAM, and when the script starts, usage is 4.7% but this climbs steadily over time to 100%, at which point the box becomes unresponsive. Since I'm new to PyTorch, I'm not sure if perhaps I'm doing something blatantly wrong in my code which would account for this behaviour? This is my core fitting logic, which runs as a script on linux: ``` device = 'cuda' model = SkipThought(len(text_dictionary.token2id), 128, 256, nn.NLLLoss()).to(device) // corpus_orig is a list of lists where each list-element is a text paragraph with multiple sentences n_iters=len(corpus_orig) start = time.time() print_loss_total = 0 optimizer = optim.SGD(model.parameters(), lr=0.01) for xi in range(1, n_iters + 1): x = corpus_orig[xi - 1] sents = [x for x in map(str.strip, x.split('. ')) if len(x) > 0] for i in range(1, len(sents) - 1): input_tensor, prev_tensor, next_tensor = tensorsFromPair( (sents[i], sents[i - 1], sents[i + 1]) ) optimizer.zero_grad() loss, prev_output, next_output = model(input_tensor, prev_tensor, next_tensor, use_teacher_forcing=True) loss.backward(retain_graph=True) // pytorch says I need retain_graph=True, otherwise I get an error here optimizer.step() print_loss_total += loss.item() print_loss_avg = print_loss_total print_loss_total = 0 print('%s (%d %d%%) %.4f' % (timeSince(start, xi / n_iters), xi, xi / n_iters * 100, print_loss_avg), flush=True) ``` Here are the helper functions which create the tensors passed to the model by converting all text words into indices from a predefined gensim dictionary: ``` def indexesFromSentence(sentence): return text_dictionary.doc2idx(sentence.split()) def tensorFromSentence(sentence): indexes = indexesFromSentence(sentence) indexes.append(EOS_token) return torch.tensor(indexes, dtype=torch.long, device=device).view(-1, 1) def tensorsFromPair(pair): input_tensor = tensorFromSentence(pair[0]) prev_tensor = tensorFromSentence(pair[1]) next_tensor = tensorFromSentence(pair[2]) return (input_tensor, prev_tensor, next_tensor) ``` And, finally, here is the SkipThought model itself, with all its helper classes (apologies for the wall of code): ``` class EncoderRNN(nn.Module): def __init__(self, vocab_size, embedding_size, gru_size): super(EncoderRNN, self).__init__() self.vocab_size = vocab_size self.embedding_size = embedding_size self.gru_size = gru_size self.embedding = nn.Embedding(vocab_size, embedding_size) self.gru = nn.GRU(embedding_size, gru_size) def forward(self, sentence, hidden): embeddings = self.embedding(sentence) embeddings = F.tanh(embeddings) output, hidden = self.gru(embeddings, hidden) return output, hidden def initHidden(self): return torch.zeros(1, 1, self.gru_size, device=device) def getEmbedding(self, sentence): return F.tanh(self.embedding(sentence)) class LocalAttention(nn.Module): def __init__(self, dim): super(LocalAttention, self).__init__() self.W = nn.Linear(dim, dim, bias=False) def score(self, decoder_hidden, encoder_out): encoder_out = self.W(encoder_out) encoder_out = encoder_out.permute(1, 0, 2) return encoder_out @ decoder_hidden.permute(1, 2, 0) def forward(self, decoder_hidden, encoder_out): energies = self.score(decoder_hidden, encoder_out) mask = F.softmax(energies, dim=1) context = encoder_out.permute(1, 2, 0) @ mask context = context.permute(2, 0, 1) mask = mask.permute(2, 0, 1) return context, mask class SkipThought(nn.Module): def __init__(self, vocab_size, embedding_size, gru_size, criterion): super(SkipThought, self).__init__() self.vocab_size = vocab_size self.embedding_size = embedding_size self.gru_size = gru_size self.criterion = criterion self.encoder = EncoderRNN(vocab_size, embedding_size, gru_size) self.prev_gru = nn.GRU(embedding_size + gru_size, gru_size) self.next_gru = nn.GRU(embedding_size + gru_size, gru_size) self.attention = LocalAttention(gru_size) self.worder = nn.Linear(gru_size * 2, vocab_size) self.softmax = nn.LogSoftmax(dim=1) self.encoder_hidden = self.encoder.initHidden() def forward(self, input_tensor, prev_tensor, next_tensor, use_teacher_forcing=True): encoder_hidden = self.encoder_hidden prev_length = prev_tensor.size(0) next_length = next_tensor.size(0) loss = 0 encoder_output, encoder_hidden = self.encoder(input_tensor, encoder_hidden) prev_input = torch.tensor([[SOS_token]], device=device) next_input = torch.tensor([[SOS_token]], device=device) prev_hidden = encoder_hidden next_hidden = encoder_hidden self.encoder_hidden = encoder_hidden prev_output = [] next_output = [] if use_teacher_forcing: # Teacher forcing: Feed the target as the next input for di in range(prev_length): embedded = self.encoder.getEmbedding(prev_input) context, _ = self.attention(prev_hidden, encoder_output) decoder_output, prev_hidden = self.prev_gru(torch.cat([embedded, context], dim=2), prev_hidden) decoder_output = self.softmax(self.worder(torch.cat([decoder_output, context], dim=2)[0])) loss += self.criterion(decoder_output, prev_tensor[di]) prev_output.append( decoder_output.topk(1)[1].squeeze().detach().item() ) prev_input = prev_tensor[di].unsqueeze(0) # Teacher forcing for di in range(next_length): embedded = self.encoder.getEmbedding(next_input) context, _ = self.attention(next_hidden, encoder_output) decoder_output, next_hidden = self.next_gru(torch.cat([embedded, context], dim=2), next_hidden) decoder_output = self.softmax(self.worder(torch.cat([decoder_output, context], dim=2)[0])) loss += self.criterion(decoder_output, next_tensor[di]) next_output.append( decoder_output.topk(1)[1].squeeze().detach().item() ) next_input = next_tensor[di].unsqueeze(0) # Teacher forcing else: # Without teacher forcing: use its own predictions as the next input for di in range(prev_length): embedded = self.encoder.getEmbedding(prev_input) context, _ = self.attention(prev_hidden, encoder_output) decoder_output, prev_hidden = self.prev_gru(torch.cat([embedded, context], dim=2), prev_hidden) decoder_output = self.softmax(self.worder(torch.cat([decoder_output, context], dim=2)[0])) topv, topi = decoder_output.topk(1) prev_input = topi.squeeze().detach() # detach from history as input loss += self.criterion(decoder_output, prev_tensor[di]) prev_output.append( prev_input.item() ) if prev_input.item() == EOS_token: break prev_input = prev_input.unsqueeze(0).unsqueeze(0) for di in range(next_length): embedded = self.encoder.getEmbedding(next_input) context, _ = self.attention(next_hidden, encoder_output) decoder_output, next_hidden = self.next_gru(torch.cat([embedded, context], dim=2), next_hidden) decoder_output = self.softmax(self.worder(torch.cat([decoder_output, context], dim=2)[0])) topv, topi = decoder_output.topk(1) next_input = topi.squeeze().detach() # detach from history as input loss += self.criterion(decoder_output, next_tensor[di]) next_output.append( next_input.item() ) if next_input.item() == EOS_token: break next_input = next_input.unsqueeze(0).unsqueeze(0) return loss, prev_output, next_output ``` ## Expected behavior Memory usage in Linux should not linearly increase as the fitting script runs, especially not to the point at which the box dies. ## Environment - PyTorch Version (e.g., 1.0): 0.4.1 - OS (e.g., Linux): Linux (Ubuntu 16.04) - How you installed PyTorch (`conda`, `pip`, source): pip install torch - Build command you used (if compiling from source): N/A - Python version: 3.6.6 - CUDA/cuDNN version: CUDA 9.0.176 / CuDNN 7.4.1.5 - GPU models and configuration: Tesla V100 - Any other relevant information:

github.com/pytorch/pytorch

[Bug] CPU memory keeps increasing when using gloo backend

opened 07:37AM - 18 Jan 19 UTC

closed 07:54PM - 03 Feb 19 UTC

stgzr

oncall: distributed cherry-picked

## 🐛 Bug I found CPU memory keeps increasing when using `dist.all_gather` in `g…loo` backend. Update: it seems that `gather` and `all_reduce` have the same issue. ## To Reproduce ``` python """ test.py """ import os import sys import gc import logging import time import torch import torch.distributed as dist from torch.distributed import get_rank, get_world_size from mpi4py import MPI comm = MPI.COMM_WORLD size = comm.Get_size() rank = comm.Get_rank() print (rank, size) os.environ['CUDA_VISIBLE_DEVICES'] = str(rank) os.environ['MASTER_ADDR'] = '127.0.0.1' os.environ['MASTER_PORT'] = '23333' os.environ['WORLD_SIZE'] = str(size) os.environ['RANK'] = str(rank) dist.init_process_group(backend="gloo") print ("initialize gloo successfully [rank {}] pid({})".format(rank, os.getpid())) torch.backends.cudnn.benchmark = True if __name__ == "__main__": batch_size = 16 feat_dim = 2048 pid = os.getpid() prev_mem = 0 tensor_list = [torch.FloatTensor(batch_size, feat_dim) for _ in range(get_world_size())] values = torch.FloatTensor(batch_size, feat_dim) for idx in range(10000): if idx % 20 == 0 and get_rank() == 0: cur_mem = (int(open('/proc/%s/statm' % pid, 'r').read().split()[1]) + 0.0) / 256 add_mem = cur_mem - prev_mem prev_mem = cur_mem print ("train iterations: %s, added mem: %s M" % (idx, add_mem)) dist.all_gather(tensor_list=tensor_list, tensor=values) ``` Steps to reproduce the behavior: ``` bash mpirun -n 4 python test.py ``` Result: >... train iterations: 2340, added mem: 10.0546875 M train iterations: 2360, added mem: 10.0546875 M train iterations: 2380, added mem: 9.796875 M train iterations: 2400, added mem: 10.0546875 M train iterations: 2420, added mem: 10.0546875 M train iterations: 2440, added mem: 10.0546875 M train iterations: 2460, added mem: 10.0546875 M train iterations: 2480, added mem: 10.0546875 M train iterations: 2500, added mem: 9.796875 M train iterations: 2520, added mem: 10.0546875 M train iterations: 2540, added mem: 10.0546875 M ... ## Expected behavior CPU memory keeps increasing (`added mem` has positive value) ## Environment - PyTorch Version: 1.0.0 - OS: CentOS 7.2 - GCC version: (GCC) 4.8.5 20150623 - CMake version: version 3.12.2 - How you installed PyTorch: `conda` and source - Build command you used (if compiling from source): `python setup.py install` - Python version: 2.7.15 - CUDA/cuDNN version: CUDA 9.0 / cuDNN 7.2 - GPU models and configuration: GPU 0: Tesla P100-PCIE-16GB GPU 1: Tesla P100-PCIE-16GB GPU 2: Tesla P100-PCIE-16GB GPU 3: Tesla P100-PCIE-16GB GPU 4: Tesla P100-PCIE-16GB GPU 5: Tesla P100-PCIE-16GB GPU 6: Tesla P100-PCIE-16GB GPU 7: Tesla P100-PCIE-16GB ## Additional context It works well when I use PyTorch v0.4.0.

This with the @ptrblck response made me understand what are the reasons for potential memory leak in pytorch. I will suggest read thru the links I have posted.