DefaultCPUAllocator: not enough memory: you tried to allocate 34359738368 bytes

I am new to Pytorch coding and recently have been working on a project on Pycharm, where the main goal to achieve is, my LSTM based neural network would classify an activity based on video input. The way this works is, my video input is first converted to individual image frames. Now based on the difference between each image frame, the activity is classified. Till now, I have been able to convert my video to image frames, but after that during the making of the neural network, I am getting an error. DefaultCPUAllocator: not enough memory: you tried to allocate 34359738368 bytes.
I have tried decreasing the batch size, reducing the features, switched to a better, faster laptop but I haven’t been able to solve the problem.

CNN_Classification_Training.py

import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import transforms,datasets, models
import torch.optim as optim
from torch.autograd import Variable
import numpy as np
import matplotlib.pyplot as plt
import copy
import time
import os

os.environ['KMP_DUPLICATE_LIB_OK']='True'
if __name__ == '__main__':
    criterion = nn.NLLLoss()
    use_gpu = torch.cuda.is_available()
    if use_gpu:
        pinMem = True
    else:
        pinMem = False
    
    trainDir = 'train_5class'
    valDir = 'test_5class'
    apply_transform = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor()])


    train_dataset = datasets.ImageFolder(trainDir,transform=apply_transform)
    trainLoader = torch.utils.data.DataLoader(train_dataset, batch_size=128, shuffle=True,num_workers=4, pin_memory=pinMem)

    test_dataset = datasets.ImageFolder(valDir,transform=apply_transform)
    testLoader = torch.utils.data.DataLoader(test_dataset, batch_size=128, shuffle=False,num_workers=4, pin_memory=pinMem)


    print('No. of samples in train set: '+str(len(trainLoader.dataset)))
    print('No. of samples in test set: '+str(len(testLoader.dataset)))

    net = models.resnet18(pretrained=True)
    print(net)



    totalParams = 0
    for params in net.parameters():
        print(params.size())
        totalParams += np.sum(np.prod(params.size()))
    print('Total number of parameters: '+str(totalParams))

    net.fc = nn.Linear(512,101)

    iterations = 2

    trainLoss = []
    trainAcc = []
    testLoss = []
    testAcc = []

    start = time.time()
    for epoch in range(iterations):
        epochStart = time.time()
        runningLoss = 0.0
        avgTotalLoss = 0.0
        running_correct = 0
        count = 0
        net.train(True) # For training
        batchNum = 1
        for data in trainLoader:
            count=count+1
            print(count)
            inputs,labels = data

            if use_gpu:
                inputs, labels = Variable(inputs.cuda()), Variable(labels.cuda())
                outputs = net(inputs)
                _, predicted = torch.max(outputs.data, 1)
                running_correct += (predicted.cpu() == labels.data.cpu()).sum()
            else:
                inputs, labels = Variable(inputs), Variable(labels)
                outputs = net(inputs)
                _, predicted = torch.max(outputs.data, 1)
                running_correct += (predicted == labels.data).sum()
       

            optimizer = optim.Adam(net.parameters(), lr=1e-4)
            optimizer.zero_grad()


            loss = criterion(F.log_softmax(outputs), labels)

            loss.backward()

            optimizer.step()

            runningLoss += loss.item()
            batchNum += 1

        avgTrainAcc = running_correct/float(len(trainLoader.dataset))
        avgTrainLoss = runningLoss/float(len(trainLoader.dataset))
        trainAcc.append(avgTrainAcc)
        trainLoss.append(avgTrainLoss)
    

        net.train(False)
        running_correct = 0
        cnt = 0
        print("GOING IN INNER FOR-LOOP")
        for data in testLoader:
            cnt = cnt+1
            print(cnt)
            inputs,labels = data
            # Wrap them in Variable
            if use_gpu:
                inputs, labels= Variable(inputs.cuda()), Variable(labels.cuda())
                outputs = net(inputs)
                _, predicted = torch.max(outputs.data, 1)
                running_correct += (predicted.cpu() == labels.data.cpu()).sum()
            else:
                inputs, labels = Variable(inputs), Variable(labels)
                # Model 1
                outputs = net(inputs)
                _, predicted = torch.max(outputs.data, 1)
                running_correct += (predicted == labels.data).sum()
        
            loss = criterion(F.log_softmax(outputs), labels)
        
            runningLoss += loss.item()

        avgTestLoss = runningLoss/float(len(testLoader.dataset))
        avgTestAcc = running_correct/float(len(testLoader.dataset))
        testAcc.append(avgTestAcc)
        testLoss.append(avgTestLoss)
    

        fig1 = plt.figure(1)
        plt.plot(range(epoch+1),trainLoss,'r-',label='train')
        plt.plot(range(epoch+1),testLoss,'g-',label='test')
        if epoch==0:
            plt.legend(loc='upper left')
            plt.xlabel('Epochs')
            plt.ylabel('Loss')

        fig2 = plt.figure(2)
        plt.plot(range(epoch+1),trainAcc,'r-',label='train')
        plt.plot(range(epoch+1),testAcc,'g-',label='test')
        if epoch==0:
            plt.legend(loc='upper left')
            plt.xlabel('Epochs')
            plt.ylabel('Accuracy')
        
  
        
        epochEnd = time.time()-epochStart
        print('Iteration: {:.0f} /{:.0f};  Training Loss: {:.6f} ; Training Acc: {:.3f}'\
              .format(epoch + 1,iterations,avgTrainLoss,avgTrainAcc*100))
        print('Iteration: {:.0f} /{:.0f};  Testing Loss: {:.6f} ; Testing Acc: {:.3f}'\
              .format(epoch + 1,iterations,avgTestLoss,avgTestAcc*100))
   
        print('Time consumed: {:.0f}m {:.0f}s'.format(epochEnd//60,epochEnd%60))
    end = time.time()-start
    print('Training completed in {:.0f}m {:.0f}s'.format(end//60,end%60))

    torch.save(net.state_dict(), 'resnet18Pre_fcOnly5class_ucf101_10adam_1e-4_b128.pt')


CNN_Train.py

import torch
import torch.nn as nn
from torch.autograd import Variable
from torchvision import models, transforms

from PIL import Image
import os
import numpy as np
import pickle


# Check availability of GPU
use_gpu = torch.cuda.is_available()

# Load train-test list
with open('trainList_5class.pckl','rb') as f:
    trainList = pickle.load(f)
with open('testList_5class.pckl','rb') as f:
    testList = pickle.load(f)
    
classes = []
for item in trainList:
    c = item.split('_')[1]
    if c not in classes:
        classes.append(c)

net = models.resnet18()
net.fc = nn.Linear(512,101)
# Loading saved states
net.load_state_dict(torch.load('resnet18Pre_fcOnly5class_ucf101_10adam_1e-4_b128.pt'))

# Removing fully connected layer for feature extraction
model = nn.Sequential(*list(net.children())[:-1])
if use_gpu:
    model = model.cuda()

data_transforms = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),            
        transforms.ToTensor()
    ])

framePath = 'frames/'
for item in trainList:
    cName = item.split('_')[1]
    srcPath = framePath+cName+'/'+item    
    fNames = os.listdir(srcPath)
    # filename template
    fTemplate = fNames[0].split('_')
    fCount = len(fNames)
    for fNum in range(fCount):
        fileName = fTemplate[0]+'_'+fTemplate[1]+'_'+fTemplate[2]+'_'+fTemplate[3]+'_'+str(fNum+1)+'.jpg'
        if os.path.exists(srcPath+'/'+fileName):
            # Loading image
            img = Image.open(srcPath+'/'+fileName)
            # Transform to tensor
            imgTensor = data_transforms(img).unsqueeze(0)
            if use_gpu:
                inp = Variable(imgTensor.cuda())
            else:
                inp = Variable(imgTensor)
            # Feed-forward through model+stack features for each video
            if fNum == 0:
                out = model(inp)                
                out = out.view(out.size()[0],-1).data.cpu()                
            else:
                out1 = model(inp)               
                out1 = out1.view(out1.size()[0],-1).data.cpu()                
                out = torch.cat((out,out1),0)
        else:
            print(fileName+ ' missing!')       
    # out dimension -> frame count x 512
    featSavePath = 'ucf101_resnet18Feat/train/'+cName # Directory for saving features
    if not os.path.exists(featSavePath):
        os.makedirs(featSavePath)
    torch.save(out,os.path.join(featSavePath,item+'.pt'))

framePath = 'frames/'
for item in testList:
    cName = item.split('_')[1]
    srcPath = framePath+cName+'/'+item    
    fNames = os.listdir(srcPath)
    fTemplate = fNames[0].split('_')
    fCount = len(fNames)
    for fNum in range(fCount):
        fileName = fTemplate[0]+'_'+fTemplate[1]+'_'+fTemplate[2]+'_'+fTemplate[3]+'_'+str(fNum+1)+'.jpg'
        if os.path.exists(srcPath+'/'+fileName):
            img = Image.open(srcPath+'/'+fileName)
            imgTensor = data_transforms(img).unsqueeze(0)
            inp = Variable(imgTensor.cuda())
            if fNum == 0:
                out = model(inp)                
                out = out.view(out.size()[0],-1).data.cpu()
                
            else:
                out1 = model(inp)               
                out1 = out1.view(out1.size()[0],-1).data.cpu()                
                out = torch.cat((out,out1),0)
        else:
            print(fileName+ ' missing!')
      
    featSavePath = 'ucf101_resnet18Feat/test/'+cName
    if not os.path.exists(featSavePath):
        os.makedirs(featSavePath)
    torch.save(out,os.path.join(featSavePath,item+'.pt'))

I guess you might be running out of memory in this line of code:

out = torch.cat((out,out1),0)

since you are recreating a tensor in each iteration by concatenating the new output to it.
Note that this approach is slow and you should rather append the outputs to a list and create the tensor afterwards.
However, even in this case, you would be storing the outputs for the entire dataset in a single tensor, so you would need to check if your system has enough RAM to do so.

1 Like

Hey thanks for the answer ! I have tried reducing my dataset by a great deal, but the error still persists. Is there any other way I can make my tensor use less memory ?

I would try to narrow down which part of the code tries to allocate the mentioned 32GB and then check how to avoid it. As previously described, you might want to append the tensors to a list first if this line of code is indeed creating the issues.

1 Like

Hey, thanks a lot! appending the tensors to a list really helped me out and the error was gone. By the way, is it possible for me to deploy this model in an android application so that it could dynamically recognize the activity present on the screen? I have deployed an Image classification model previously but this is my first time working with Neural networks. If it’s possible can you please briefly provide the steps? I tried searching the whole internet but didn’t find anything suitable which would help me!

This Android tutorial might be helpful.

1 Like