Hi, I’m currently using the C++ interface to load jit-ed models and it works like a charm.
However there is not much control over memory usage. I would like to do two things:
- Firstly I need to clear the cache after running a model. The closest I found in the torch code were calls to c10::cuda::CUDACachingAllocator::emptyCache(); , would this indeed be the right function to call? It doesn’t seem exposed in the code but I could probably find a way to change to code to call it.
- Secondly what would be extremely useful is a minimal memory usage mode where intermediate activations are released as soon as possible. When you are only interested in the output, you don’t need the activations and it would really be great for using Torch in production!
It would be like the NoGrad guard, but basically a NoGradNoState guard. Is there already a way to achieve this? Or can someone perhaps outline how someone would implement it in the code?