I find that the memory usage is really much when I load a cuda model. I do some experiments and find that if I load a traced model by torch::jit::load, It cannot be really released if it belongs to cuda.
here is my testing code:
class model
{
public:
model();
torch::jit::script::Module module;
void load()
{
module = torch::jit::load("/home/gino/textDetectorCuda.pt"); // a cuda model, I also prepared a cpu one.
}
};
int main()
{
{
//stage.1 initialization ( not yet load model )
cout<<"initialization ..."<<endl;
unique_ptr<model> myModel;
myModel=make_unique<model>();
cin.get();
//stage.2 load a model
cout<<"load ..."<<endl;
myModel->load();
cin.get();
//stage.3 release the unique_ptr
cout<<"try reset ... "<<endl;
myModel.reset();
cin.get();
}
//stage.4 outside the lifecycle
cout<<"try outside ... "<<endl;
cin.get();
cout<<"bye~"<<endl;
return 0;
}
the testing code contains 4 steps: 1. run the program and do nothing. 2. load a model by jit 3. reset the unique_ptr which contains the jit module 4. outside the lifecycle ( I assume that the ptr would automatically be gone so even I do something wrong to deal with the ptr, the ptr would still release here )
I use linux command to check the memory usage
free -m
and I would see the “available” value to check the memory is freed or not.
I use jit to trace the EasyOcr’s text detection model, then saved a cpu and cuda model. However, The model itself doesn’t matter to this testing.
Here is the testing result
- available memory of CPU model
stage.0 (before running the program ): 6650
stage.1 (run the program and do nothing) : 6593
stage.2 (load the model) : 6515
stage.3 (release the outer class): 6590
stage.4 (before end the program): 6594
as you can see , the memory usage is released successfully when I use a cpu model. However, If I load a CUDA model it would be really huge and behavior wired.
- available memory of CUDA model
stage.0 (before running the program ): 6545
stage.1 (run the program and do nothing) : 6459
stage.2 (load the model) : 5344
stage.3 (release the outer class): 5342
stage.4 (before end the program): 5340
now you see that even I reset the outer class , the memory usage is not released. The cuda model is too huge to ignore it so I’m finding some ways to release it when the model finished its job. I find some discussion about using “cudaDeviceReset()” can reset everything in cuda then free the memory usage. However, I’m curiously that the usage in the ram can also be released or not.
So, How do I release the jit::module correctly ?