How to properly clean up the GPU memory occupied by PyTorch when using Flask?

wowdeeply · May 26, 2021, 4:27pm

Here is the code:

app = Flask('test_api')
@app.route('/recognition', methods=['post','get'])
def recognition():
    imgid =request.values.get("ImgId")
    addtime = request.values.get("DateTime")
    img = request.values.get("Image")
    img = img.replace(' ', '+')
    torch.cuda.empty_cache()
    gc.collect()

    img_b64decode = base64.b64decode(img) # base64解码
    #num_img = np.fromstring(img_b64decode, np.uint8) # 转换np序列
    num_img = np.array(Image.open(BytesIO(img_b64decode)))
    predict = test_net()

if __name__ == '__main__':
    ...
    app.run(port=6321, debug='true', host="0.0.0.0", threaded=True)

After running a few times, the GPU memory will be full.
I don’t know if it is related to Flask, I would like to ask you for advice.

Python: 3.6.8
Flask: 2.0.0
PyTorch: 1.5.1+cu101
GPU: RTX 2070

ptrblck · May 27, 2021, 5:07am

Make sure you are not storing unnecessary data, such as the computation graphs by appending the output, loss, etc. of the model in e.g. a list. If you only care about the inference results, you could also wrap the model in a with torch.no_grad() block, if that isn’t already used. Besides that I cannot see any obvious issues in the posted code, but it also doesn’t show a lot.

wowdeeply · May 27, 2021, 7:40am

Thank you for your reply.
I tried with torch.no_grad(). The result is that GPU memory usage will fluctuate up and down, and the program can run normally at first. But after running for a period of time, the GPU memory will be full in the later stage.
Moreover, “GPU memory is full” is not always full, but still “fluctuates up and down”, that is to say, it can work occasionally.

ptrblck · May 27, 2021, 7:52am

Since you are emptying the cache it’s expected that the memory usage fluctuates, otherwise PyTorch would reuse the cached memory.
In case you are running out of memory after a while without seeing an increased memory usage in each iteration, I would guess that some inputs might be larger than others (could this be the case)?