Undefined symbol: _ZNK2at6Tensor4dataIcEEPT_v when build pytorch cuda extension

HI, Dear guys,

I’m build a cuda extention for my pytorch script. However, the following error occurs.
File "<frozen importlib._bootstrap>", line 684, in _load File "<frozen importlib._bootstrap>", line 658, in _load_unlocked File "<frozen importlib._bootstrap>", line 571, in module_from_spec File "<frozen importlib._bootstrap_external>", line 922, in create_module File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed ImportError: /tmp/torch_extensions/tools/tools.so: undefined symbol: _ZNK2at6Tensor4dataIcEEPT_v

I wonder if anyone could give me some light.

The code is very simple.

from torch.utils.cpp_extension import load
tools = load(name='tools', sources=['test.cpp', 'test.cu'])
print(tools.dispatch, type(tools.dispatch), str(tools.dispatch))

in file test.cpp:

#include <torch/extension.h>
#include <vector>

#define CHECK_CUDA(x) AT_ASSERTM(x.type().is_cuda(), #x " must be a CUDA tensor")
#define CHECK_CONTIGUOUS(x) AT_ASSERTM(x.is_contiguous(), #x " must be contiguous")

// declarations
int cuda_dispatch(torch::Tensor input, torch::Tensor codec);

std::vector<torch::Tensor> dispatch(torch::Tensor input, torch::Tensor codec) {
  cuda_dispatch(input, codec);
  return { codec };

  m.def("dispatch", &dispatch, "dispatch input according quantization levels (CUDA only)");

in file test.cu:

#include <torch/extension.h>

#include <cuda.h>
#include <cuda_runtime.h>

template <typename Dtype>
__global__ void kernel_dispatch(
    const Dtype * __restrict__ input,
    char * __restrict__ codec) {

int cuda_dispatch(torch::Tensor input, torch::Tensor codec) {
  const int gsize = 10240;
  const int lsize = 128;

  AT_DISPATCH_FLOATING_TYPES_AND_HALF(input.type(), "kernel_dispatch",
        ( [&] { 
          kernel_dispatch <scalar_t> <<< gsize, lsize>>> (

  return 0;

if the codec tensor is changed to a float tensor and replace codec.data <char>() to codec.data<scalar_t>()) in the test.cu, the error was gone.

Thus I suspect that there’s some limitation I unaware for the cuda template around AT_DISPATCH_FLOATING_TYPES_AND_HALF.

Do you have multiple PyTorch installs? That is often the main issue, in such errors.

1 Like

@smth Hi, Thanks for your attention.

I install the pytorch via pip (in a pyenv managed enviroment) and only one instance is installed.
I tried on different machines, such as
1080ti + CUDA9.0
2080ti + CUDA10.0

I set up a quick demo in repo: https://github.com/blueardour/extension-demo.git
There are two extensions inside. The only differece is the datatype of the codec vairable. As I mentioned in the original post, if all the tensors are the same type, no error occur. Otherwise I met the undefined error. (It indeed could be worked around to force all tensor as the same datatype, however I just want to try to reduce the data precision for runtime memory saving in my real applications.)

to run: python demo.py

<built-in method dispatch of PyCapsule object at 0x7fe5b02e4510>
demo2 build failed

Hi, I latter found the error is not related with cuda extension. It seemed to be the data type convert error when accessing the tensor.

#include <ATen/ATen.h>
int cuda_dispatch(at::Tensor input, at::Tensor codec) {
  //void *ptr0 = codec.data<char>();  //this would casue the undefined error
  void *ptr1 = codec.data<unsigned char>();  // this is OK
  void *ptr2 = codec.data<short>();  // this is OK
  void *ptr3 = codec.data<int>();  // this is OK
  return 0;

Seems only type of char is not support~

Never mind when the official guide says:
at::Tensor::C10_DEPRECATED_MESSAGE("Tensor.data<T>() is deprecated. Please use Tensor.data_ptr <T>() instead.") const in tensor-ops