RuntimeError: error in LoadLibraryA

Trying to do the following on the GPU, matmul is fine, but once hit the, it throws the exception:

Traceback (most recent call last):
  File "~/", line 9, in <module>
    c_x =, dim=1)
RuntimeError: error in LoadLibraryA
import torch

rand1 = torch.rand((2, 256)).cuda()
rand2 = torch.rand((2, 256)).cuda()

c_x =, rand2.t())

c_x_output = [rand1, rand2]
c_x =, dim=1)


OS: Win10
python: 3.8.1
torch: 1.4.0
CUDA: 10.1


The same exception thrown. :hot_face: :hot_face:
OS: Win10
python: 3.8.1
torch: 1.4.0
CUDA: 10.0

anybody else?

I downgraded the Python version to 3.7.5 and now it works.

1 Like

same here,
OS: Win10
python: 3.8.1
torch: 1.4.0
CUDA: 10.1

though, I fixed the problem, by implementing the cat use case that I had.
in my case, this function solved my problem.
this function takes 4 arrays of 4D tensors, and concats them based on dim=1.
this is hardcoded though.

    def cat(arr, device):
        total_depth = 0
        for x in arr:
            total_depth += x.size()[1]
        num_samples = arr[0].size()[0]
        h = arr[0].size()[2]
        w = arr[0].size()[3]
        concated = torch.zeros((num_samples, total_depth, h, w), device=device)
        last = 0
        concated[:, :arr[0].size()[1], :, :] = arr[0]
        last = arr[0].size()[1]
        concated[:, last:last + arr[1].size()[1], :, :] = arr[1]
        last = last + arr[1].size()[1]
        concated[:, last:last + arr[2].size()[1], :, :] = arr[2]
        last = last + arr[2].size()[1]
        concated[:, last:, :, :] = arr[3]
        return concated

This might be a Windows-specific error, so maybe @peterjc123 might have seen this error before.

1 Like

Which python distribution are you using? Conda or Python from the official site?

Reproduced with Python3.8.0 from the official site. Looking into it.


my python dist:

Python 3.8.0 (tags/v3.8.0:fa919fd, Oct 14 2019, 19:37:50) [MSC v.1916 64 bit (AMD64)] on win32

I am using pip and everything from the official site.
also, cuda specifications:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:12:52_Pacific_Daylight_Time_2019
Cuda compilation tools, release 10.1, V10.1.243

everything else is on my comment above.

The workground is to run the following code after import torch:

>>> import ctypes
>>> ctypes.cdll.LoadLibrary('caffe2_nvrtc.dll')

workaround seems fine as for now … still trying to see if this can be fixed in pytorch next release rather than do this manual caffe2 dll load here …

Fix landed in the release branch.

I have a similar problem :dizzy_face:

but it’s fine when i use the cpu

Is the posted workaround not working for you?