ResNet152 for features extraction from UCF-101 dataset is time consuming

DeepLearner17 · December 14, 2017, 5:28pm

Hello,

l lunched ResNet-152 to extract features from UCF-101 dataset which has 13.000 videos. It has more than 2 millions frames. l lunched it on GPU 1 week ago as follow and up to now is still running. Is there any way to accelerate the extraction ?

Here is my code :

import cv2, subprocess
import numpy as np
import os, sys, collections, random, string
from scipy import ndimage
import matplotlib.pyplot as plt
import glob
from datetime import datetime
import torch
import numpy as np
import torch.nn as nn
import torchvision.models as models
from torch.autograd import Variable
from torchvision.transforms import ToTensor
from PIL import Image
import glob as glob
import pickle
from numpy import genfromtxt



resnet152 = models.resnet152(pretrained=True)
modules = list(resnet152.children())[:-1]
resnet152 = nn.Sequential(*modules)
frames = []
labels = []
for root, dirs, files in os.walk("/UCF/tmp_frames/test", topdown=False):
    
    for name in files:

        open_img = Image.open(os.path.join(root, name))
        image = ToTensor()(open_img).unsqueeze(0)
        img_var = Variable(image)
        features_var = resnet152(img_var)
        features = features_var.data
        features = features.numpy()
        features = features.ravel()
        frames.append(features)
        
        label = name.split('.')[0]
        labels.append(label)

dataset = dict(zip(labels, frames))

Any suggestion , to optimize the process of feautre extraction ?
for 26.000 frames it took 4 hourrs and half

Thank you

Enumaris · December 14, 2017, 5:50pm

You are using an exceptionally deep neural network for an exceptionally large problem. It’s kind of expected that things would be very slow unless you have extremely powerful hardware. I don’t know what exactly you could do to make the process faster, but here are some thoughts:

Is it imperative that you use ResNet-152? This is such a deep network (152 layers!!), it requires a huge number of floating point operations at every forward step. What are your requirements for the feature vectors that you extract? https://cloud.githubusercontent.com/assets/11435359/13046277/e904c04c-d412-11e5-9260-efc5b8301e2f.jpg looking at that graph, it seems to me that ResNet-152 has kind of diminishing returns on ResNet-101 which is much shallower and should run probably quite a bit faster. If your requirements are not that high, you could even consider using simply ResNet-50 which already out-performs VGG-16 so it’s not a weak network by any means. At 50 layers deep, ResNet-50 is already quite a deep neural network, but since it’s only 1/3 as deep as ResNet-152, it’ll probably run roughly 3 times faster.
What hardware are you running? Have you considered moving to multiple GPUs? The authors of ResNet-152 used 8 GPUs to train their model, otherwise it would have taken them a prohibitively long time to train. Obviously, if you can get multiple GPUs running, the process would be much faster.
Have you looked into which process is bottle-necking the feature extraction? I would suspect it’s the part where you run the image through resnet152: features_var=resnet152(img_var) but it might be good to know exactly which process and by how much that process is the bottleneck. It would give you some insight into what you might be able to to do to speed things up.
Have you considered parallelizing between CPU and GPU? For me, having the CPU do the data-loading in parallel to the GPU running the neural net sped up my training by almost 100%. But different problems are different, so how much parallelization of CPU and GPU processes is utilized will speed up the process probably depends on point 3 above.