Cudnn vs cudatoolkit

uumami · June 15, 2022, 4:45am

Hello, we are creating our own images for training and deployment using conda. Our base image is

 FROM nvidia/cuda:11.4.2-base-ubuntu18.04
 RUN conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1    cudatoolkit=11.3 -c pytorch -c conda-forge

I noticed that the cudatoolkit increases the size of the image substantially, so I was wondering if it will be enough to just install cudnn version via conda to make it lighter. But we are not sure if it will affect performance or something else.
We use several libraries or frameworks that use a PyTorch backend, and this will be the base image for them, so we were wondering which would be the minimal installation/requirements for pytorch-gpu. We train and also use inference, they could be different images!

Thanks :3

ptrblck · June 15, 2022, 5:14am

No, the PyTorch conda binaries depend on the cudatoolkit which needs to be installed.
Since you are using a CUDA base container you might want to build PyTorch from source as it would use the locally shipped CUDA toolkit instead of the prebuilt binary.

uumami · June 15, 2022, 5:38am

Thanks Patrick!
We tried building from source using the official make-docker files from pytorch but our attempts were unsuccessful.

Is there a way to “easily” build Pytorch from source without doing it manually inside the container, and then creating an image from that? We need to keep some replicability. Is there a way/command that builds from source with GPU, that can be included in the Dockerfile?

ptrblck · June 15, 2022, 6:08am

I’m not sure what “image” means in this context, but in case you are planning to release a docker container, the build instructions should be sufficient from here. The easier way would of course be to download the conda binary or pip wheel, but then you would have to deal with the size. Note that the size of the CUDA toolkit (and other libraries) inside the container would also not be small, so the used approach might depend on your overall goal.

uumami · June 15, 2022, 4:32pm

The problem with compiling from source is that in order to use GPU compilation we need to launch a container from the image. Otherwise, the image can not see or communicate with the GPU. So after creating an Docker-Image from a dockerfile, we launch the container with --gpus all, compile from source (with GPU), and convert that Container to an Image (following steps mentioned here).

Our pipeline requires to avoid that kind of practices, and make the build replicable from a Dockerfile & build arguments (no container to image). I tried using the makefile from the github (here) without success.

:3