Nvrtc error with CUDA

I am using PyTorch 2.5.1 (CUDA12.4) with Python 3.10.15 on Windows 11. I am performing operations on complex numbers as follows:

a = torch.randn(2, 10, 5, 6)  # dtype -> torch.float32
c = (1j * a).exp()  # expected dtype -> torch.complex64

I am getting the following error:

nvrtc: error: failed to open nvrtc-builtins64_124.dll.
Make sure that nvrtc-builtins64_124.dll is installed correctly.

Executing only 1j * a does not raise an error. The error occurs in the exponentiation. The complete error message is very long to paste here, but the first few lines are:

RuntimeError: 
  #ifdef __HIPCC__
  #define ERROR_UNSUPPORTED_CAST ;
  // corresponds to aten/src/ATen/native/cuda/thread_constants.h
  #define CUDA_OR_ROCM_NUM_THREADS 256
  // corresponds to aten/src/ATen/cuda/detail/OffsetCalculator.cuh
  #define MAX_DIMS 16
  #ifndef __forceinline__
  #define __forceinline__ inline __attribute__((always_inline))
  #endif
  #else

It mentions something about unsupported cast error, but I am not sure why this occurs and how to fix it. I have tried changing the Python versions (3.11, 3.12) and PyTorch versions (2.5.1cu121, 2.5.1cu118, 2.4.0cu124) but nothing seems to work. I have the CudaToolkit and cuDNN installed from NVIDIA’s website. Thanks in advance for your help.

As an update, this happens for abs() and log() operations, but not for sum(). It is not clear why some of the operations are not executed on CUDA. I need to resolve this to run the rest of the code on GPU. I would appreciate any insights on this.

Thanks in advance for your help.

The error is pointing to a missing nvrtc library in your build, which is used for runtime compilation of some kernels. I don’t have a Windows system to reproduce the issue and cannot reproduce it on Linux.
Ad a workaround you could try to install nvrtc manually.
CC @malfet in case you have seen this issue before.