I am noticing a ~3Gb increase in CPU RAM occupancy after the first .cuda() call. I recently updated the pytorch v1.6 to v1.9.0+cu111. After the upgrade i see there is increase in RAM utilization of ~3 GB when i load the model.
Both used PyTorch releases are old by now so update to the latest stable or nightly release and check if it’s still the case as e.g. CUDA’s lazy loading was enabled which should reduce the device memory usage for >=11.7 and additionally the host memory usage for >=11.8.
I use ubuntu 20.04 as a base image to package the model as service for inferencing. So when i updated from torch 1.6 to 1.9, the image size grew from 7 Gb to 14 Gb.
I see below exception when i try to import torchvision.
root@a761eb87f45e:/var/log/supervisor# python3
Python 3.7.7 (default, May 7 2020, 21:25:33)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type “help”, “copyright”, “credits” or “license” for more information.
import torchvision
/opt/conda/lib/python3.7/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: libc10_hip.so: cannot open shared object file: No such file or directory
warn(f"Failed to load image Python extension: {e}")
Hi @ptrblck, i updated to the latest torch v2.0.1, cuda 11.7. I see some improvements in terms of GPU utilization and RAM memory usage. What is the improvements in cuda 11.8? Do you advise to update to cuda v11.8?