When using an OCR model built with PyTorch for inference, there is an oscillating increase in GPU memory usage as the batch size is increased

During inference with an OCR model, experimental results indicate a non-linear increase in GPU memory usage as the batch size is raised.

The specific experimental method involved taking a real image and resizing it to various sizes using OpenCV’s resize method with linear interpolation. The batch size was increased by duplicating the numpy matrices of the samples.

det_model.eval().cuda()
origin_image = cv2.imread(image_path)
with torch.no_grad():
    for tup in itertools.product([i for i in range(640,1000,32)],[i for i in range(640,1000,32)]):
        i = tup[0]
        j = tup[1]
        torch.cuda.empty_cache()
        for z in range(1,100):
            try:
                 img = cv2.resize(origin_image,(i,j), interpolation=cv2.INTER_LINEAR)
                 img = copy_img(batch_size=z, img)
                 input_for_torch = torch.from_numpy(img).cuda()
                 print(det_model(input_for_torch))
            except RuntimeError as e:
                 if "CUDA out of memory" in str(e):
                    print(str(i) + '. ' + "{} × {} batch_size: {}".format(i, j, z) + '    OOM')

By creating child processes that infinitely loop to monitor GPU usage, you obtained peak memory consumption and inference times for each batch size, as shown in the following graph.
image-1697096452458
Why did the results as shown in the graph occur?

The oscillation is causing by clearing the cache manually in each iteration. Remove torch.cuda.empty_cache() and the memory should be reusable and increase if needed.

I will experiment with this method, thanks a lot!