CUDA_tensor_apply in extension gives undefined symbol

I;m trying to use CUDA_tensor_applyN from ATen/cuda/CUDAApplyUtils.cuh in an extension. While it builds fine when I try to import the extension I get:

ImportError: /root/local/miniconda/lib/python3.7/site-packages/mish_cuda/
undefined symbol: _ZN2at6native6legacy4cuda27_th_copy_ignoring_overlaps_ERNS_6TensorERKS3_

Demangled: at::native::legacy::cuda::_th_copy_ignoring_overlaps_(at::Tensor&, at::Tensor const&)

I don’t think it’s a compiler compatability or linker options issue as similar posts tend to be, but I have very little c++ dev experience so could be wrong. I’ve built another extension that imports and runs fine on the same system (though different conda environment). I’ve tried with both conda installed toolchain (gxx_linux-64 7.3.0/binutils_linux-64 2.31.1) and system toolchain (Arch linux). I also tried in a docker based off of pytorch/extension-script though with FROM nvidia/cuda:10.0-cudnn7-devel-ubuntu16.04 for CUDA support.

Poking around PyTorch libs in the docker I find:

$ objdump -t ${CONDA_PREFIX}/lib/python3.7/site-packages/torch/lib/\
  | grep '_ZN2at6native6legacy4cuda27_th_copy_ignoring_overlaps_ERNS_6TensorERKS3_'
0000000003bfed90 l     F .text  0000000000000552              _ZN2at6native6legacy4cuda27_th_copy_ignoring_overlaps_ERNS_6TensorERKS3_
$ objdump -T ${CONDA_PREFIX}/lib/python3.7/site-packages/torch/lib/ \
  | grep '_ZN2at6native6legacy4cuda27_th_copy_ignoring_overlaps_ERNS_6TensorERKS3_'

I’m not using any custom compiler/linker options (except --expt-extended-lambda for nvcc). I did try explicitly linking against which didn’t help and I gather isn’t needed as CUDAExtension explicitly only links this on windows.

I nastily hacked out that function (only used for overlapping tensors) through a #define and the extension builds, imports and runs fine.


This is terribly hard to debug remotely. For me, something like this seems to arise most often when the headers you use in C++ and the library (torch) Python uses are out of sync, i.e. have different versions.

Best regards


1 Like

Thanks a lot for replying. Completely understand the difficulty of debugging.

I’m building/linking against headers/libs out of the conda environment I’m testing in (and tried in a docker), so not quite sure how they’d be out of sync, unless I’m missing something there.

I have now put up the source for the extension. There’s a (and Dockerfile I also tested against). As noted the error is only on import so:

$ pip install git+ && python -c "import mish_cuda"
:Collecting git+
ImportError: /home/user/dev/mish-cuda/.venv/lib/python3.7/site-packages/mish_cuda/
undefined symbol: _ZN2at6native6legacy4cuda27_th_copy_ignoring_overlaps_ERNS_6TensorERKS3_

And here’s an inline JIT reproduction (just directly using the function causing issues not the CUDA_tensor_apply which is actually what I care about): Gives me the same error as above.

#!/usr/bin/env python
from torch.utils.cpp_extension import load_inline
src = """
#include <ATen/LegacyTHFunctionsCUDA.h>
void import_test(torch::Tensor x, const torch::Tensor y) {
    auto z = at::native::legacy::cuda::_th_copy_ignoring_overlaps_(x, y);
mod = load_inline(name='import_error_test', cpp_sources=src, functions='import_test')

I’ll also try and replicate against libtorch to see if it’s just an extension issue (though as it also looks to dynamically link I suspect not, statically linking may behave differently, on my minimal knowledge here). Just need to get on top of the process there.

One thing I’m a bit unclear on is exactly what is supposed to be externally usable in PyTorch, if I was more sure this was externally usable and it wasn’t a system config error at my end I’d submit an issue. Ideally I’d submit a PR but I struggle a bit to understand how exactly the declaration generation stuff works in PyTorch.
It is defined in src/Aten/Declarations.cwrap and does appear in the generated include/ATen/LegacyTHFunctionsCUDA.h.
So just some confirmation there would be great. Or pointers about what parts in the generation process I might look at.

Yes, so as far as I can see it is not part of the public API (none of the functions in that header are): it is not t declared TORCH_API.

Best regards


Thanks a lot, hadn’t noticed that define. Does that apply to header only functions? It looks perhaps not as PackedTensorAccessor is not declared with that but I gather is public API. So then it’s a little unclear if CUDA_tensor_apply is intended to be private or not.

Yeah, it doesn’t apply to header only functions. However, as you noticed, it relies on _th_copy_ignoring_overlaps_ which isn’t exported.

So it could be that CUDA_tensor_apply is not that well-liked (with TensorIterator taking the part but not being public API), but I would not know.

If you are very keen to use it (and I can see how “have an utility for applying pointwise functions” is a very reasonable thing), maybe opening an issue asking about it is good.
I could be that we just export the missing symbol and be done with it, or that there is another preferred solution.

Best regards


1 Like

Yeah, looking at TORCH_API I realised it couldn’t be used with a header only function as it controls import/export which is not appropriate there. I gather there’s no equivalent marking for header only functions. Though yes, using private functions would be indicative of not being intended to be public. I guess underlying this is me not really being sure about the include folder in the conda package (which seems to mirror the include folder in libtorch). For torch includes this looks to come from
torch/csrc/api/include/torch but I’m not sure how the Aten and c10 stuff gets in here. My initial expectation everything in these includes was intended to be public seems wrong. I gather this might also be due to some things that were designed for internal use being now made public as libtorch and the C++ interface is developed.

Investigating the private function it looks like it’s use may be a legacy from when this was adapted from THC code. Following subsequent changes in ATen I think it can now be safely replaced with the standard Tensor::copy_. I think this is the only location this function is used in ATen so this would allow removal of the ATen wrapper which I gather would be desirable in the ongoing work to remove THC dependencies. The function is also a slightly nasty hack (as noted in code comments) and so nice to be able to avoid.
Just need to confirm this a little more and I’ll submit an issue. This seems a stronger reason for removing it than allowing public use which while nice is likely not the primary concern of PyTorch devs. So hopefully a win/win.

Thanks again, and also thanks for your wonderful articles. The one on autograd functions and JIT operators in C++ in particular has been especially helpful in starting to get a handle on how they fit together which was previously eluding me.