Error when building an extension

Here’s my code.

file: q_cu.cpp

#include <torch/torch.h>
#include <cmath>

void run_cu(at::Tensor& out, const at::Tensor& x, int bw, int fl);

at::Tensor run(const at::Tensor x, int bw, int fl) {
    at::Tensor out = at::empty(x.sizes(), x.options());
    run_cu(out, x, bw, fl);
    return out;

    m.def("run", &run, "run");


#include <ATen/ATen.h>
#include <ATen/cuda/CUDAApplyUtils.cuh>
#include <cmath>

void run_cu(at::Tensor& out, const at::Tensor& x, int bw, int fl) {
    int64_t q_max = (1 << (bw - 1)) - 1;
    int64_t q_min = -(1 << (bw - 1));
    float scale = pow(2.0, fl);
    float inv_scale = 1.0 / scale;
    at::cuda::CUDA_tensor_apply2<float, float>(
        x, out, [=] __device__(const float& src, float& dst) {
            dst =(fminf(q_max,
                              static_cast<int64_t>(std::round(val * scale))
                 ) * inv_scale;


from setuptools import setup
from torch.utils.cpp_extension import CUDAExtension, BuildExtension

                    ["q_cu.cpp", ""],
    cmdclass={"build_ext": BuildExtension}

There is no error when I compile this extension. But if I import it,

import torch
import my_method2

# output
# ImportError...
# undefined symbol: 
# _ZN2at6native6legacy4cuda27_th_copy_ignoring_overlaps_ERNS_6TensorERKS3_

How can I sovle this problem?

BTW, I just want to make an extension like fake_quantize_slice_cuda, which use std::round instead of std::nearbyint.:sob:

It seems that at::native::legacy::cuda::_th_copy_ignoring_overlaps_ is not found.
My PyTorch version is 1.3.1.

Does it mean I should not use at::cuda::CUDA_tensor_apply2 in my extension?

Finally, I found this topic: CUDA_tensor_apply in extension gives undefined symbol

As I speculated above, this problem could be solved after I replace the _th_copy_ignoring_overlaps_ with at::_copy_from. (I will verify its correctness later :mask:)

Yes, I think it should be fine to replace with a standard copy function (I used Tensor::copy_ as you can see here which seems to work fine, I explicitly tested overlapping tensors where it’s used).
I believe the need for _th_copy_ignoring_overlaps_ is a legacy thing that doesn’t apply anymore. In the original THC code the copy functions used the tensor apply function from which CUDA_tensor_apply is derived. So if the tensor apply function called copy then you’d get an infinite loop. Thus it calls _th_copy_ignoring_overlaps which avoids this recursion through a bit of a hack. In aten the copy functions now use a different internal pointwise apply function so there is no infinite loop and standard copy functions can be used.

1 Like

Thanks for your analysis. I have roughly checked your repository and use copy_ instead of _th_copy_ignoring_overlaps_ in my code.

But now another problem arises. My extension works fine only when the tensor is in the current device.

y = fun(  # work fine
y = fun(  # raise error

I’m not asking you to solve my problem. I will make a careful comparison and do more test later.