CPU memory keeps increasing when load data

I’m doing a video vlassification task, to load frames of videos faster, I first write frames to hdf5 files, the core code is like this:

frame_path = 'frame_path'
with frame_path.open('rb') as fi:
    data = fi.read() 
    np.frombuffer(data, dtype='uint8')
    h5_file[i]= np.frombuffer(data, dtype='uint8') # h5_file is the h5py file that stores frames.

Then in the dataloader, I try to read frames like this:

import cv2 as cv
data = h5_file['rgb'][i]
data = cv.imdecode(data,cv.IMREAD_COLOR)  
data = cv.cvtColor(data, cv.COLOR_BGR2RGB)


from PIL import Image
with Image.open(io.BytesIO(data),'r') as data:
    data = np.array(data)

For both of these two methods (use opencv or Pillow):
when the python version is 3.8.8, the CPU memory keeps increasing
when the python version is 3.9.16, the CPU memory will not keep incereasing when do not use distributeddataparallel (single processing), but still keeps increasing when use distributeddataparallel (multiprocessing).

I don’t know which libhdf5 version you are using but based on this issue it might be a known issue in 1.21.0.

It seems that the suggestion in the issue you metioned solves my problem, thanks very much!!!