Libtorch uses much more GPU memory than python?

Riddick_Gao · May 15, 2019, 12:14pm

Recently, I am using libtorch to do the prediction. I have saved the original model to .pt model using torch::jit::trace( ) API.

    traced_script_module = torch.jit.trace(base_model, example)
    traced_script_module.save(ouput_path)

Then I loaded it in C++ using torch::jit::load() API and do the model.eval().

    module = torch::jit::load(model_path);
    module->eval()

But I found that libtorch occupied much more GPU memory to do the forward( ) with same image size than original module in python.

So I reloaded the .pt model in python. I found it have the same performance as the original module.

It seems like that C++ uses much more memory than python with same model and same image size.

How can I deal with it? Does any one has the same problem? thanks.

pytorch version: 1.1.0
python version: 3.7.0
OS: windows 10
IDE: VS2015
cuda10 + cudnn10.0 v7.5.1.10

tholzmann · September 27, 2019, 1:20pm

Hi,

I am observing the same behaviour. Did you find out the reason for this?

Best,
Thomas

yf225 · September 27, 2019, 2:49pm

@Riddick_Gao @tholzmann Could you shared the code that can reproduce this problem? And does it only happen on Windows?

cc. @peterjc123

tholzmann · October 1, 2019, 6:26am

On my side it happens on Windows, I’m currently not able to test it on Linux.

A snipped of my code:

std::ifstream in( fileName, std::ios_base::binary);
m_TorchDevice = std::unique_ptr<torch::Device>( new torch::Device( torch::DeviceType::CPU) );
m_Model = torch::jit::load( in, *m_TorchDevice ); 
m_Model.eval();
torch::Tensor resultTensor = m_Model.forward( inputsVec ).toTensor();

Note that I have observed the same behaviour with GPU and CPU.

Best,
Thomas