Writing c++/CUDA extension and inline void parallel_run() does not work

I am writing pytorch extension using c++/CUDA. But I meet some problem, I want to use inline void parallel_for(const int64_t begin, const int64_t end, const int64_t grain_size, const F& f); funtion in ATen/Parallel.h . When I finish the code, I meet this bug:

/.local/lib/python3.6/site-packages/torch/include/ATen/Parallel.h:48:13: error: ‘void at::parallel_for(int64_t, int64_t, int64_t, const F&) [with F = geomean_pool2d_backward_out_frame(scalar_t, scalar_t*, scalar_t*, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int, int, int, int, int, int, bool, c10::optional) [with scalar_t = float; int64_t = long int]::<lambda(int64_t, int64_t)>; int64_t = long int]’, declared using local type ‘const geomean_pool2d_backward_out_frame(scalar_t*, scalar_t*, scalar_t*, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int, int, int, int, int, int, bool, c10::optional) [with scalar_t = float; int64_t = long int]::<lambda(int64_t, int64_t)>’, is used but never defined [-fpermissive]*

  Can you help me solve this problem? I really want to how to use inline void at::parallel_for() in my code. Please help me!
1 Like

Hi, I have tried directly use parallel_for in ATen/ParallelOpenMP.h instead of ATen/Parallel.h. Works for me. Do not know why. :slight_smile: