Libtorch uses much more GPU memory than python?

Recently, I am using libtorch to do the prediction. I have saved the original model to .pt model using torch::jit::trace( ) API.

    traced_script_module = torch.jit.trace(base_model, example)

Then I loaded it in C++ using torch::jit::load() API and do the model.eval().

    module = torch::jit::load(model_path);

But I found that libtorch occupied much more GPU memory to do the forward( ) with same image size than original module in python.

So I reloaded the .pt model in python. I found it have the same performance as the original module.

It seems like that C++ uses much more memory than python with same model and same image size.

How can I deal with it? Does any one has the same problem? thanks.

pytorch version: 1.1.0
python version: 3.7.0
OS: windows 10
IDE: VS2015
cuda10 + cudnn10.0 v7.5.1.10



I am observing the same behaviour. Did you find out the reason for this?


@Riddick_Gao @tholzmann Could you shared the code that can reproduce this problem? And does it only happen on Windows?

cc. @peterjc123

On my side it happens on Windows, Iā€™m currently not able to test it on Linux.

A snipped of my code:

std::ifstream in( fileName, std::ios_base::binary);
m_TorchDevice = std::unique_ptr<torch::Device>( new torch::Device( torch::DeviceType::CPU) );
m_Model = torch::jit::load( in, *m_TorchDevice ); 
torch::Tensor resultTensor = m_Model.forward( inputsVec ).toTensor(); 

Note that I have observed the same behaviour with GPU and CPU.