Compiling PyTorch 1.7.0 with CUDA compute capability 3.0 fails due to declaration conflict

Hey, I’m trying to build PyTorch to work with a NVIDIA Quadro K4000.
I set up CUDA 10.1.243 and tested it.
Then I installed cuDNN 7.6.2.24.
I found several tutorials in the internet that claimed that PyTorch 1.7.0 should work with CUDA 10.1.
But I can’t use the legacy binaries since they only work with higher compute capability.
A lot of posts in this forum claim that you can get newer versions of PyTorch to work with lower compute capability by installing them from source. So that is what I tried.

I followed the tutorial from the PyTorch 1.7.0 Readme: GitHub - pytorch/pytorch at v1.7.0
I installed all common dependencies and magma-cuda101.
And at first, everything works and my computer compiles the first ~3000 files without problems.
(I’m using GCC 8.3.0.)

But then it fails like this:

In file included from /nosave/pytorch/aten/src/ATen/cuda/nvrtc_stub/ATenNVRTC.cpp:1:
/nosave/pytorch/aten/src/ATen/cuda/nvrtc_stub/ATenNVRTC.h:33:5: error: declaration of ‘nvrtcResult (* at::cuda::NVRTC::nvrtcVersion)(int*, int*)’ [-fpermissive]
   _(nvrtcVersion)                                \
     ^~~~~~~~~~~~
/nosave/pytorch/aten/src/ATen/cuda/nvrtc_stub/ATenNVRTC.h:101:45: note: in definition of macro ‘CREATE_MEMBER’
 #define CREATE_MEMBER(name) decltype(&name) name;
                                             ^~~~
/nosave/pytorch/aten/src/ATen/cuda/nvrtc_stub/ATenNVRTC.h:102:3: note: in expansion of macro ‘AT_FORALL_NVRTC’
   AT_FORALL_NVRTC(CREATE_MEMBER)
   ^~~~~~~~~~~~~~~
In file included from /nosave/pytorch/aten/src/ATen/cuda/nvrtc_stub/ATenNVRTC.h:5,
                 from /nosave/pytorch/aten/src/ATen/cuda/nvrtc_stub/ATenNVRTC.cpp:1:
/software/easybuild/software/CUDA/10.1.243/include/nvrtc.h:92:13: error: changes meaning of ‘nvrtcVersion’ from ‘nvrtcResult nvrtcVersion(int*, int*)’ [-fpermissive]
 nvrtcResult nvrtcVersion(int *major, int *minor);

Surprisingly, I could not find a single other case of this error on Google.

I’ve tried this with export TORCH_CUDA_ARCH_LIST=3.0 and without it.

This looks like a compatibility issue for me, but I’m not sure how that is possible.
I know that PyTorch 1.7.0 should work with CUDA 10.1…

Can somebody please help me compile this?
Or give me an alternative approach that allows me to accelerate learning on this GPU?