This function does.t have the tensor slice and copy operation, but the conv2d input and weight size are same as the above function, it cost about 14ms.
Why does the tensor slice and tensor copy operation brings so much time cost ?
Can anyone help me to accelerate the tensor slice and tensor copy operation ?
Thanks so much !
Are you running your code on the CPU or GPU?
In the latter case, note that CUDA calls are asynchronous, so that you would need to synchronize the code before starting and stopping the timer:
Thanks for your reply, I know the tensor slice and copy process should’t be added to the running time, but can I have a more solid method to speed up the tensor slice and copy process, because the tensor slice and tensor copy process which brings much time delay for the conv2d function.
I mean maybe you are timing the code wrong and the majority of the time is indeed spent in the convolution.
Could you time your codes again using the synchronization and post the timings?