Hi
In the output of the profiler, I see that unrolled_elementwise_kernel takes some GPU time (second kernel with highest GPU time). However, when I looked at the source code, this kernel simply calls another kernel
__global__ void unrolled_elementwise_kernel(int N, func_t f, array_t data,
inp_calc_t ic, out_calc_t oc, loader_t l, storer_t s)
{
int remaining = N - block_work_size * blockIdx.x;
auto policy = memory::policies::unroll<array_t, inp_calc_t, out_calc_t, loader_t, storer_t>(data, remaining, ic, oc, l, s);
elementwise_kernel_helper(f, policy);
}
So, I wonder why that is shown in the output of profiler? I mean what can be understand from that?