Conv3d can be optimized (for case when no temporal stride done)

If i perform a Conv3d operation where I dont have any temporal stride it is as good as performing spatial convolution on batch of images.
So I performed some tests on time (tried to synchronize cuda as this post using cuda.synchronize()).

The scripts can be found here for 2D convolutions and here for 3D Convolutions

Here are the outputs

---------------------------------------------------------------------------
Number Of Frames = 50
----------------------------------------
going to time2D K = 1
Time for 1 Kernel ---- 1.316544 ---- 
----------------------------------------
going to time2D K = 3
Time for 3 Kernel ---- 4.946677 ---- 
----------------------------------------
going to time2D K = 7
Time for 7 Kernel ---- 5.898326 ---- 

---------------------------------------------------------------------------
Number Of Frames = 50
----------------------------------------
going to time3D K = 1
Time for 1 Kernel ---- 1.931768 ---- 
----------------------------------------
going to time3D K = 3
Time for 3 Kernel ---- 9.096771 ---- 
----------------------------------------
going to time3D K = 7
Time for 7 Kernel ---- 42.892779 ---- 

---------------------------------------------------------------------------
Number Of Frames = 30
----------------------------------------
going to time2D K = 1
Time for 1 Kernel ---- 0.786775 ---- 
----------------------------------------
going to time2D K = 3
Time for 3 Kernel ---- 2.987944 ---- 
----------------------------------------
going to time2D K = 7
Time for 7 Kernel ---- 3.638900 ---- 

---------------------------------------------------------------------------
Number Of Frames = 30
----------------------------------------
going to time3D K = 1
Time for 1 Kernel ---- 1.146866 ---- 
----------------------------------------
going to time3D K = 3
Time for 3 Kernel ---- 5.463441 ---- 
----------------------------------------
going to time3D K = 7
Time for 7 Kernel ---- 25.650379 ---- 

For kernel size 7 (spatial) this gives 8 times speed and for kernel size 5 (spatial) it gives double speed and even for kernel size 1 it gives boost of 1.7 times speed (unless this isnt the actual time and wrong due to synchronization effects)
I have taken time to reshape into account when using 2D Convolutions (reshaping order is wrong currently)

Is it a common case (3d conv with 1 as the kernel size in temporal dimension)?

Yeah examples for videos, if we downsample them spatially we dont want to perform temporal striding. Also if you want to change number of channels using a kernel of size 1 (example in a residual module as we do) we have convolution kernels as (1,1,1) which can be optimized as I have calculated…
I believe this would be helpful!