RuntimeError: error in LoadLibraryA

Trying to do the following on the GPU, matmul is fine, but once hit the, it throws the exception:

Traceback (most recent call last):
  File "~/", line 9, in <module>
    c_x =, dim=1)
RuntimeError: error in LoadLibraryA
import torch

rand1 = torch.rand((2, 256)).cuda()
rand2 = torch.rand((2, 256)).cuda()

c_x =, rand2.t())

c_x_output = [rand1, rand2]
c_x =, dim=1)


OS: Win10
python: 3.8.1
torch: 1.4.0
CUDA: 10.1


The same exception thrown. :hot_face: :hot_face:
OS: Win10
python: 3.8.1
torch: 1.4.0
CUDA: 10.0

anybody else?

I downgraded the Python version to 3.7.5 and now it works.

1 Like

same here,
OS: Win10
python: 3.8.1
torch: 1.4.0
CUDA: 10.1

though, I fixed the problem, by implementing the cat use case that I had.
in my case, this function solved my problem.
this function takes 4 arrays of 4D tensors, and concats them based on dim=1.
this is hardcoded though.

    def cat(arr, device):
        total_depth = 0
        for x in arr:
            total_depth += x.size()[1]
        num_samples = arr[0].size()[0]
        h = arr[0].size()[2]
        w = arr[0].size()[3]
        concated = torch.zeros((num_samples, total_depth, h, w), device=device)
        last = 0
        concated[:, :arr[0].size()[1], :, :] = arr[0]
        last = arr[0].size()[1]
        concated[:, last:last + arr[1].size()[1], :, :] = arr[1]
        last = last + arr[1].size()[1]
        concated[:, last:last + arr[2].size()[1], :, :] = arr[2]
        last = last + arr[2].size()[1]
        concated[:, last:, :, :] = arr[3]
        return concated

This might be a Windows-specific error, so maybe @peterjc123 might have seen this error before.

1 Like

Which python distribution are you using? Conda or Python from the official site?

Reproduced with Python3.8.0 from the official site. Looking into it.


my python dist:

Python 3.8.0 (tags/v3.8.0:fa919fd, Oct 14 2019, 19:37:50) [MSC v.1916 64 bit (AMD64)] on win32

I am using pip and everything from the official site.
also, cuda specifications:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:12:52_Pacific_Daylight_Time_2019
Cuda compilation tools, release 10.1, V10.1.243

everything else is on my comment above.

The workground is to run the following code after import torch:

>>> import ctypes
>>> ctypes.cdll.LoadLibrary('caffe2_nvrtc.dll')

workaround seems fine as for now … still trying to see if this can be fixed in pytorch next release rather than do this manual caffe2 dll load here …

Fix landed in the release branch.

I have a similar problem :dizzy_face:

but it’s fine when i use the cpu

Is the posted workaround not working for you?

How similar is it? Which python distribution are you using? Do you have Nvidia GPU in your PC? Do you have the GPU driver installed? What is the exact error message? We didn’t compile packages for CUDA 10.2. So does cuda: 10.2 mean that you compiled the package by yourself?

I also have this problem.
OS: win10
using both libtorch 1.5.0 and 1.6.0 downloaded from
cuda: 10.1
Only happens in C++ when using torchscript loaded model on GPU, on CPU works fine.
Also no problem when running model in python

When I copy caffe2_nvrtc.dll to the same folder as executable it works fine. Is there any method to make it work without copying?