Is thre a workaround for the given version? It would be time consuming to retrain all the models and the models are incompatible with a new smp/PyTorch.
What can cause this memory leak? Is it an old bug that is fixed in more recent versions of PyTorch?
Thank you for your response.
So, how do I check for a memory leak? Situation. When I analyze some 10 photos one by one it works fine. Then the application fails with memory allocation error. I do shrink all photos to 10 megapixel size and I do know that my GPU can hold and process 10- Mpixel photos.
But there is a clear memory leak buildup. Possibly some memory fragmentation issues.
I do
del tensor
cuda.empty_cache()
after each analysis.
What can cause the problem and how to fix it?
PS We are working on the analysis of skin health if it helps.
PPS. In the actual application I load my models (takes some 2 Gb out of 8 Gbs). Then I receive and analize photos. Each photo is shrunk to 10- Mpixels prior to analysis. Works fine on the first ten photos, then fails with
RuntimeError: CUDA out of memory.
Tried to allocate 1.17 GiB (GPU 0; 7.93 GiB total capacity;
3.47 GiB already allocated;
1.03 GiB free; 6.14 GiB reserved in total by PyTorch)
If reserved memory is >> allocated memory try setting
max_split_size_mb to avoid fragmentation.
See documentation for Memory Management and
PYTORCH_CUDA_ALLOC_CONF
Can you please expand your explaination?
Do I have a memory leak or not?
When I run my app and analize the same photo hundred times it crashes with “out of memory. Tried to allocate 1.2 Gb of memory, only have 1.1 free memory”.