How to Unload model in C++?

I need to unload model which loaded by torch::jit::load() with C++. But I couldn’t find anything about it just like ‘torch::jit::unload()’.

  1. In my C++ code, there are two functions and provided api using pybind11:
#include <iostream>
#include <vector>
#include <string>

#include <torch/torch.h>
#include <torch/script.h>

#include <opencv2/core.hpp>
#include <opencv2/imgcodecs.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc.hpp>
torch::jit::script::Module initialize(const std::string &fpath) {
    torch::jit::script::Module model = torch::jit::load(fpath, "cuda");"cuda:0");
    return model;

std::tuple<at::Tensor, at::Tensor> do_inference(torch::jit::scriot::Module model, ...) {
    ...  // pre-processing
    at::Tensor out = model.forward(input).toTensor();
    ...  // post processing

    m.def("initialize", &initialize, ...)
    m.def("do_inference",  &do_inference, ...)
  1. Build it to, and import this module in Python3.6, python is a web service, just like:
import foo
from flask import Flask  # a web service

model = foo.initialize("./")

img = open("./kitty.jpg", "rb")

# This service is running
  1. The Web service is working fine, but how to unload model from device(GPU or CPU), If I want to update the model without stopping the service.
    Is there a solution for that out there?

Could you just del model and clear the cache?
I don’t see what any modules, references etc. are stored in the C++ code, so I assume the objects can be freed in your Python script.

Yes, I want to del model and clear the cache, but NOT shutdown the Python process.
In fact, C++ code is used for inference(including pre-processing and post-processing), The Python process(which is running as a service) load model and inference by calling the C++ dynamic library(may have multiple, one dynamic library, one model).
So, Is there a way to delete loaded model without stopping the Python process?

Yes, you can delete the model by running del model in the Python script without shutting it down.

I’m trying to use del model in the Python process, but it seem doesn’t work.
when I using nvidia-smi, the model process is still here and the GPU memory not be released.
It seems that the model is loaded in C++ context and must be released in the C++, right?

The memory might still be in the cache and thus not released. You could run torch.cuda.empty_cache() to get it back. I’m not 100% sure, but don’t think you need to release it in the C++ backend, since you are holding a reference to it in Python.

I use torch.cuda.memory_summary("cuda") before and after the del model, I found the “Allocated memory” to be released after del model, It seems worked. Thanks.