If i perform a Conv3d operation where I dont have any temporal stride it is as good as performing spatial convolution on batch of images.

So I performed some tests on time (tried to synchronize cuda as this post using cuda.synchronize()).

The scripts can be found here for 2D convolutions and here for 3D Convolutions

Here are the outputs

```
---------------------------------------------------------------------------
Number Of Frames = 50
----------------------------------------
going to time2D K = 1
Time for 1 Kernel ---- 1.316544 ----
----------------------------------------
going to time2D K = 3
Time for 3 Kernel ---- 4.946677 ----
----------------------------------------
going to time2D K = 7
Time for 7 Kernel ---- 5.898326 ----
---------------------------------------------------------------------------
Number Of Frames = 50
----------------------------------------
going to time3D K = 1
Time for 1 Kernel ---- 1.931768 ----
----------------------------------------
going to time3D K = 3
Time for 3 Kernel ---- 9.096771 ----
----------------------------------------
going to time3D K = 7
Time for 7 Kernel ---- 42.892779 ----
---------------------------------------------------------------------------
Number Of Frames = 30
----------------------------------------
going to time2D K = 1
Time for 1 Kernel ---- 0.786775 ----
----------------------------------------
going to time2D K = 3
Time for 3 Kernel ---- 2.987944 ----
----------------------------------------
going to time2D K = 7
Time for 7 Kernel ---- 3.638900 ----
---------------------------------------------------------------------------
Number Of Frames = 30
----------------------------------------
going to time3D K = 1
Time for 1 Kernel ---- 1.146866 ----
----------------------------------------
going to time3D K = 3
Time for 3 Kernel ---- 5.463441 ----
----------------------------------------
going to time3D K = 7
Time for 7 Kernel ---- 25.650379 ----
```

For kernel size 7 (spatial) this gives 8 times speed and for kernel size 5 (spatial) it gives double speed and even for kernel size 1 it gives boost of 1.7 times speed (unless this isnt the actual time and wrong due to synchronization effects)

I have taken time to reshape into account when using 2D Convolutions (reshaping order is wrong currently)