Pytorch 1.9 with CUDA 11.0?

I am using Google GCP GPUs, and it appears the only machine image they provide is CUDA 11.0 (!). Only pytorch <= 1.7 supports CUDA 11.0.

I am creating a Dockerfile for my project. However, some of my library’s dependencies want pytorch 1.9, so they upgrade from pytorch 1.7 GPU version to pytorch 1.9 CPU version.

I think Pytorch 1.9 is a must. But then I am not sure what workaround is the least painful:

  1. Can I use CUDA 10.2 in the Docker, even though the bare metal system is CUDA 11.0? Or will that cause problems. I have seen conflicting advice on this.
  2. I could try to build my own Google machine image. This seems very painful tho.
  3. I could try to build pytorch 1.9 from scratch in my Docker and use CUDA 11.0 in my Docker. I haven’t found good Dockerfiles explaining how to do this.

Any other suggestions?

I don’t have an exact answer to your question, but I tried something similar as in your third option here: Torch CUDA unknown error, but CUDA and nvidia-smi properly installed on Azure K8s Service - PyTorch Forums. I’m using Microsoft Azure instead of Google cloud, but same principle for building a docker image. The example I have there correctly installs CUDA and nvidia-smi, so that CUDA 11.4 is detected, but unfortunately pytorch can’t initialize CUDA properly for some reason, so I’d also be interested if you can find a solution using your third option.

  1. Yes, you should be able to use a CUDA10.2 docker container, as its driver requirement would be met by the newer CUDA11 driver.
  2. I don’t know what a Google machine image is and how hard it would be to build it.
  3. You could reuse the Dockerfile from the PyTorch repository.

Also, did you try to install the CUDA11.1 Pytorch 1.9.0 binaries? I don’t know which driver is installed on the node, but the CUDA enhanced compatibility might be used.