Recently, I am using libtorch to do the prediction. I have saved the original model to .pt model using torch::jit::trace( ) API.
traced_script_module = torch.jit.trace(base_model, example) traced_script_module.save(ouput_path)
Then I loaded it in C++ using torch::jit::load() API and do the model.eval().
module = torch::jit::load(model_path); module->eval()
But I found that libtorch occupied much more GPU memory to do the forward( ) with same image size than original module in python.
So I reloaded the .pt model in python. I found it have the same performance as the original module.
It seems like that C++ uses much more memory than python with same model and same image size.
How can I deal with it? Does any one has the same problem? thanks.
pytorch version: 1.1.0
python version: 3.7.0
OS: windows 10
cuda10 + cudnn10.0 v184.108.40.206