CPU memory keeps increasing when load data

hou9612 · April 11, 2023, 12:26pm

I’m doing a video vlassification task, to load frames of videos faster, I first write frames to hdf5 files, the core code is like this:

frame_path = 'frame_path'
with frame_path.open('rb') as fi:
    data = fi.read() 
    np.frombuffer(data, dtype='uint8')
    h5_file[i]= np.frombuffer(data, dtype='uint8') # h5_file is the h5py file that stores frames.

Then in the dataloader, I try to read frames like this:

import cv2 as cv
....
data = h5_file['rgb'][i]
data = cv.imdecode(data,cv.IMREAD_COLOR)  
data = cv.cvtColor(data, cv.COLOR_BGR2RGB)

or

from PIL import Image
...
with Image.open(io.BytesIO(data),'r') as data:
    data = np.array(data)

For both of these two methods (use opencv or Pillow):
when the python version is 3.8.8, the CPU memory keeps increasing
when the python version is 3.9.16, the CPU memory will not keep incereasing when do not use distributeddataparallel (single processing), but still keeps increasing when use distributeddataparallel (multiprocessing).

ptrblck · April 12, 2023, 1:36am

I don’t know which libhdf5 version you are using but based on this issue it might be a known issue in 1.21.0.

hou9612 · April 13, 2023, 3:55pm

It seems that the suggestion in the issue you metioned solves my problem, thanks very much!!!