How to pass pytorch cuda extension with non-float tensor types

yuanzheng625 · October 2, 2018, 9:30pm

Recently I’ve been trying to wrap my cuda kernel with pytorch c-extension. Many examples I can search online is to use THCudaTensor* in a c wrapper function, but it seems to me that the tensor is float. My cuda-kernel takes uint8 in and output int16. Should I still use THCudaTensor* as interface and cast them to uint8/int16 internally?

Also, if in the wrapper function I want to call the cuda kernel many times, each time the kernel produce some intermediate tensors as output. Should I pass the temporary output from outside or can I define the temporary within?

My cuda kernel:

__global__ my_cuda_kernel (uint8* input1, uint8* input2, int16* output)
{
    // do something
}

__global__ my_cuda_kernel_2 (uint8* input1, uint8* input2, int16* input3, int16* output)
{
    // do something
}

void c_wrapper(THCState* state, THCudaTensor* input1, THCudaTensor* input2, THCudaTensor* Output)
{
     uint8* input1_  = THCudaTensor_data(state, input1);  // how to cast?
     uint8* input2_ = THCudaTensor_data(state, input2);
     int16* output_ = THCudaTensor_data(state, output); // how to cast?
     int16* temporary_out = xxxx; // how can I allocate memory from within?

     my_cuda_kernel<<<32, 32, 0, THCState_getCurrentStream(state)>>>(input1, input2, temporary_out);
     my_cuda_kernel_2<<<32, 32, 0, THCState_getCurrentStream(state)>>>(input1, input2, temporary_out, output);

}

InnovArul · October 2, 2018, 9:41pm

What if you keep the data as float*? Is there any issue with that?

yuanzheng625 · October 2, 2018, 9:45pm

My kernel is a complicated image processing algorithm, so it takes in rgb values as uint8 naturally. Sure I can change them to float, but it takes more space and I guess gpu would be slower in handling float.

InnovArul · October 2, 2018, 9:48pm

I have not used integer tensor (THCudaIntTensor*) so far.
But I could see, it has been used in torch layers.

github.com

torch/cunn/blob/master/lib/THCUNN/generic/SparseLinear.cu#L20


static bool checkSize2D(THCTensor* t, long size0, long size1)
{
return t->nDimension == 2 && t->size[0] == size0 && t->size[1] == size1;
}


static bool checkSize1D(THCTensor* t, long size0)
{
return t->nDimension == 1 && t->size[0] == size0;
}


static inline void copyCudaFloatingType(THCState *state, THCudaIntTensor *buf, THCTensor *t) {
#ifdef THC_REAL_IS_FLOAT
THCudaIntTensor_copyCudaFloat(state, buf, t);
#elif defined(THC_REAL_IS_DOUBLE)
THCudaIntTensor_copyCudaDouble(state, buf, t);
#elif defined(THC_REAL_IS_HALF)
THCudaIntTensor_copyCudaHalf(state, buf, t);
#endif
}


void THNN_(SparseLinear_updateOutput)(