Need guidance on pytorch dependency size reduction inside docker image using nvidia-container-toolkit

Dear All, Greetings.
I need to containerise an AI/ML module which is prgrammed to use following:
model: huggingface bge-v2-m3 reranker
needed dependency: torch (pytorch)
hardware: nvidia GPU
format: container based deployment/runtime
installed dependencies via pip: pytorch - which brings in all nvidia cuda packages

I am experienced with general containerization. But new to GPU, cuda, etc related ecosystem.
In dockerfile, I am installing pytorch via pip - the default GPU cuda variant, and it is installing a lot of dependencies.
Total size is around 4.4 GB.
I need to run this image on a docker runtime having some GPU hardware.
I have to learn how to assemble/setup the container runtime on the GPU machine.

I have spent some time reading about nvidia-container-toolkit.
Please advise/guide me on following points:

  1. whether installing nvidia-container-toolkit on host will help reduce any of pip dependencies inside the container image?
  2. whether I need to install cuda toolkit also on host? (cuda-toolkit)
  3. do I need to install nvidia gpu driver/ GDS also on host? (nvidia-gds)
  4. whether any of nvidia-container-toolkit or libnvidia-container1 or nvidia-container-tools contain any cuda related driver contents/ .so files to run pytorch?
  5. mainly what is the difference between pytorch’s nvidia cuda pip dependencies and nvidia-container-runtime provided libraries?
  1. No.
  2. No, you would need to install an NVIDIA Driver and properly set up the container environment.
  3. You need to install the NVIDIA Driver, but don’t need to install GDS.
  4. These libs are needed to provide GPU support for your docker containers.
  5. CUDA libraries hosted as pip wheels on PyPI provide the runtime library dependencies and PyTorch will use these to execute their kernels on the GPU. The container runtime is unrelated to PyTorch or any other CUDA application and allows you to use GPUs in docker containers.

Thanks a lot @ptrblck !
I need one more info clarification regarding the setup of ‘nvidia container toolkit’. I am not sure if this would be the right forum to ask this. Kindly guide appropriate forum:

My production runtime is on a Linux machine which can install ‘rpm’ format files (RHEL or Azure Linux or Amazon Linux). That machine would have nvidia GPU [example: AWS EC2 - g4dn instances, having nvidia g4]. I would be installing ‘docker’ runtime on that, and configure it to run on GPU using nvidia container toolkit.
On this docker, I intend to run an image containing huggingface model along with pytorch and its nvidia cuda dependencies inside that.

My query is about which all rpm packages need to be installed on the host and which all rpms should be installed inside the container?

I have gone through the following documentation:

Architecture Overview:
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/arch-overview.html

Install guide:
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

I am still not clear. I connected to nvidia rpm repo and downloaded all available rpm files. I see following:
libnvidia-container1
libnvidia-container-tools
nvidia-container-runtime
nvidia-container-toolkit
nvidia-container-toolkit-base (I think this gets covered by nvidia-container-toolkit)
nvidia-docker2

from the above list of rpms, which of these should be installed on host and which all inside container image?
Kindly guide me.

I’m not familiar with AWS instances and would assume they are already configured to run CUDA applications. I set up nodes locally with docker by following this guide.

Hi @ptrblck ,
Thanks for your response. As per the installing guide documentation, it seems to be only 1 rpm file to be installed on host/server. Nothing to be installed inside the container image to connect with that. Kindly confirm my understanding.