Compiling an Extension with CUDA files

I am following the PyTorch C FFI examples to build a C extension with CUDA.
When I took the data from CudaTensor as follows, I got a Segmentation fault. Is there any way to access the cuda memory like this and to run the loop (for/while) in parallel in the C extension with CUDA.

#include <THC/THC.h>

extern THCState *state;

int my_lib_add_forward_cuda(THCudaTensor *input, THCudaTensor *output)
{
  float * pinput = THCudaTensor_data(state, input);
  float * poutput = THCudaTensor_data(state, output);
  for(...)
  {
    poutput[i] = do_something(pinput[i]);
  }
  return 1;
}
1 Like

No, you cannot access CUDA memory like this.

Please refer to the CUDA Programming Guide on how to write CUDA programs:
http://docs.nvidia.com/cuda/cuda-c-programming-guide/

Thank you for your reply.

I’m sorry to bother you again :worried:.
I want to know how to bind the CUDA program to pytorch if I have a program in my_lib.cu.
FFI in the examples do not seem to support .cu and nvcc?

You can take out the data pointers from the arguments of your FFI function (using THCudaTensor_data(state, tensor)), inspect it’s size (assuming it’s contiguous - THCudaTensor_numel(state, tensor)), and pass both these things into your custom kernel. Does that help?

Thank you very much. It is helpful.

Hello, I have a question about compiling an extension with cuda files. I get the following error:

distutils.errors.UnknownFileError: unknown file type '.cu'

when I include .cu files in the sources:

import os
import torch
import glob
from torch.utils.ffi import create_extension

this_file = os.path.dirname(__file__)

sources = ['../Library/~.cpp',
            '../Library/~.cpp',
            '../Library/~.cu']
headers = ['../Library/~.h']
here = os.path.abspath(os.path.dirname(__file__))
lib_dir = os.path.join(here, '..', 'Library')
include_dirs = [
    os.path.join(lib_dir, '~'),
    os.path.join(lib_dir, 'Math'),
]
defines = [('WITH_CUDA', None)]
with_cuda = True

ffi = create_extension(
    '_CUDA.~',
    headers=headers,
    sources=sources,
    define_macros=defines,
    relative_to=__file__,
    with_cuda=with_cuda,
    include_dirs = include_dirs,
    extra_compile_args=["-fopenmp"]
)

if __name__ == '__main__':
    ffi.build()
    from _CUDA import ~
    print ~.__dict__

Am I doing something wrong?

No, you’re not. It’s a problem with our CUDA extensions - they never use nvcc to compile your files. A workaround for now is to compile your CUDA kernels manually, link them as a shared library and add that shared library in libraries argument of create_extension. Sorry for that :confused:

Here is an example to compiling extensions with cuda files: https://github.com/longcw/yolo2-pytorch/tree/master/layers/reorg
You can compile CUDA kernels manually and link them to pytorch extensions just like @apaszke said .

mask.sh

#!/usr/bin/env bash

CUDA_PATH=/usr/local/cuda/

cd layers/reorg/src
echo "Compiling reorg layer kernels by nvcc..."
nvcc -c -o reorg_cuda_kernel.cu.o reorg_cuda_kernel.cu -x cu -Xcompiler -fPIC -arch=sm_52

cd ../
python build.py

build.py

import os
import torch
from torch.utils.ffi import create_extension


sources = ['src/reorg_cpu.c']
headers = ['src/reorg_cpu.h']
defines = []
with_cuda = False

if torch.cuda.is_available():
    print('Including CUDA code.')
    sources += ['src/reorg_cuda.c']
    headers += ['src/reorg_cuda.h']
    defines += [('WITH_CUDA', None)]
    with_cuda = True

this_file = os.path.dirname(os.path.realpath(__file__))
# print(this_file)
extra_objects = ['src/reorg_cuda_kernel.cu.o']
extra_objects = [os.path.join(this_file, fname) for fname in extra_objects]

ffi = create_extension(
    '_ext.reorg_layer',
    headers=headers,
    sources=sources,
    define_macros=defines,
    relative_to=__file__,
    with_cuda=with_cuda,
    extra_objects=extra_objects
)

if __name__ == '__main__':
    ffi.build()

Hope it helps you.

2 Likes

Sorry for the inconvenience, we’re going to be fixing that in the future.

Thank you for your comments! With your help, I hacked my way through. I did using sligtly different argument of ffibuilder.set_source that described previously:
I first compile all cuda kernels:
Cmake:

CUDA_ADD_LIBRARY(~ STATIC c~CUDAKernels.cu)
TARGET_LINK_LIBRARIES(~)

and then link them as an extra link argument:

...
extra_link_args=['~.a']
ffi = create_extension(
    '~',
    headers=headers,
    sources=sources,
    define_macros=defines,
    relative_to=__file__,
    with_cuda=with_cuda,
    include_dirs = include_dirs,
    extra_compile_args=["-fopenmp"],
    extra_link_args = extra_link_args
)
...

This workaround does work with a static library. I had problems trying to link a shared library this way (haven’t tried enough though).

It would be great to mention this in pytorch documentation. I struggled a bit before finding out this post.
Thanks for the thread!

1 Like

@apaszke, as @ThibaultGROUEIX said, it’s better to mention this in pytorch document. This also waste me much time before I find this post.

BTW, I highly recommend this full tutorial example pytorch-custom-cuda-tutorial

Hi, your codes are really helpful (including other several repositories). Actually I am following your code to write my own operator. But I am a little confused about contiguous, because I cannot tell which operation would make results not contiguous. So, should we always check whether the tensor is contiguous before we get the data pointer by THCudaTensor_data?

@Jiang_He I have the same concern. If an input tensor is not contiguous, using THCudaTensor_data may lead to the wrong data. In the new tutorial, the check is always performed:

#define CHECK_CONTIGUOUS(x) AT_ASSERT(x.is_contiguous(), #x " must be contiguous")