Torch.cat RuntimeError: error in LoadLibraryA

maochen · February 26, 2020, 6:54pm

Trying to do the following on the GPU, matmul is fine, but once hit the torch.cat, it throws the exception:

Traceback (most recent call last):
  File "~/test.py", line 9, in <module>
    c_x = torch.cat(c_x_output, dim=1)
RuntimeError: error in LoadLibraryA

import torch

rand1 = torch.rand((2, 256)).cuda()
rand2 = torch.rand((2, 256)).cuda()

c_x = torch.mm(rand1, rand2.t())

c_x_output = [rand1, rand2]
c_x = torch.cat(c_x_output, dim=1)

print(c_x.shape)

OS: Win10
python: 3.8.1
torch: 1.4.0
CUDA: 10.1

Youyuu_Syuu · February 27, 2020, 7:11am

The same exception thrown.
OS: Win10
python: 3.8.1
torch: 1.4.0
CUDA: 10.0

anybody else?

Youyuu_Syuu · February 27, 2020, 9:15am

I downgraded the Python version to 3.7.5 and now it works.

moher · March 21, 2020, 1:59pm

same here,
OS: Win10
python: 3.8.1
torch: 1.4.0
CUDA: 10.1

though, I fixed the problem, by implementing the cat use case that I had.
in my case, this function solved my problem.
this function takes 4 arrays of 4D tensors, and concats them based on dim=1.
this is hardcoded though.

    def cat(arr, device):
        total_depth = 0
        for x in arr:
            total_depth += x.size()[1]
        num_samples = arr[0].size()[0]
        h = arr[0].size()[2]
        w = arr[0].size()[3]
        
        concated = torch.zeros((num_samples, total_depth, h, w), device=device)
        
        last = 0
        concated[:, :arr[0].size()[1], :, :] = arr[0]
        last = arr[0].size()[1]
        concated[:, last:last + arr[1].size()[1], :, :] = arr[1]
        last = last + arr[1].size()[1]
        concated[:, last:last + arr[2].size()[1], :, :] = arr[2]
        last = last + arr[2].size()[1]
        concated[:, last:, :, :] = arr[3]
        
        return concated

ptrblck · March 22, 2020, 5:09am

This might be a Windows-specific error, so maybe @peterjc123 might have seen this error before.

peterjc123 · March 22, 2020, 5:47am

Which python distribution are you using? Conda or Python from the official site?

peterjc123 · March 22, 2020, 6:08am

Reproduced with Python3.8.0 from the official site. Looking into it.

moher · March 22, 2020, 6:15am

@peterjc123

my python dist:

Python 3.8.0 (tags/v3.8.0:fa919fd, Oct 14 2019, 19:37:50) [MSC v.1916 64 bit (AMD64)] on win32

I am using pip and everything from the official site.
also, cuda specifications:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:12:52_Pacific_Daylight_Time_2019
Cuda compilation tools, release 10.1, V10.1.243

everything else is on my comment above.

peterjc123 · March 22, 2020, 6:46am

The workground is to run the following code after import torch:

>>> import ctypes
>>> ctypes.cdll.LoadLibrary('caffe2_nvrtc.dll')

maochen · March 25, 2020, 9:21pm

workaround seems fine as for now … still trying to see if this can be fixed in pytorch next release rather than do this manual caffe2 dll load here …

peterjc123 · March 26, 2020, 2:03am

Fix landed in the release branch.

lllllliuxt · April 4, 2020, 1:54pm

I have a similar problem
OS:Win10
python:3.7.1
torch:1.3.1
cuda:10.2

but it’s fine when i use the cpu

ptrblck · April 5, 2020, 3:07am

Is the posted workaround not working for you?

peterjc123 · April 25, 2020, 11:50am

How similar is it? Which python distribution are you using? Do you have Nvidia GPU in your PC? Do you have the GPU driver installed? What is the exact error message? We didn’t compile packages for CUDA 10.2. So does cuda: 10.2 mean that you compiled the package by yourself?

TBonus · August 24, 2020, 1:40pm

Hi.
I also have this problem.
OS: win10
using both libtorch 1.5.0 and 1.6.0 downloaded from pytorch.org
cuda: 10.1
Only happens in C++ when using torchscript loaded model on GPU, on CPU works fine.
Also no problem when running model in python

Update:
When I copy caffe2_nvrtc.dll to the same folder as executable it works fine. Is there any method to make it work without copying?