Conv3d vs Conv2d speed

I was trying to check speed difference between Conv2d and Conv3d and was surprised with the results

import torch
import torch.nn as nn
import time

F = 30
data = torch.randn(1,256, F, 256, 256).cuda()

def time2D():
        conv2d = nn.Conv2d(256,256,3).cuda()
        data1 = data.reshape(F,256,256,256)
        start = time.time()
        print(conv2d(torch.autograd.Variable(data1)).size())
        print("  --- %s ---  " %(time.time() - start))

def time3D():
        conv3d = nn.Conv3d(256,256,(3,3,3)).cuda()
        start = time.time()
        print(conv3d(torch.autograd.Variable(data)).size())
        print(print("  --- %s ---  " %(time.time() - start)))


print("going to time2D")
time2D()
print("going to time3D")
time3D()


and the output was the following

going to time2D
torch.Size([30, 256, 254, 254])
  --- 0.42415595054626465 ---  
going to time3D
torch.Size([1, 256, 28, 254, 254])
  --- 0.00014209747314453125 ---  

Why is Conv3d quicker than Conv2d ??? This was opposite of what I expected.
I couldnt think of a good reason for why this is happening?? Can someone give some logic or even intuition why this is the case?

Also if it’s correct why do we use 2D convolutions?? We can stack all images along the depth dimension and use Conv3d(inFeat,outFeat,(1,kernelX,kernelY))!!!

Hi,

Here is a modified version of your code that does proper timing.
The GPU initialization takes some time, so you want to make that before measuring runtimes.
Also cuda api is asynchronous so you need to sync when measuring time.
Also there might be some variance in the runtimes depending on the gpu usage, so you might want to run more than one.

import torch
import torch.nn as nn
import time

F = 30
data = torch.randn(1,256, F, 256, 256).cuda()

def time2D():
        conv2d = nn.Conv2d(256,256,3).cuda()
        data1 = data.reshape(F,256,256,256)
        torch.cuda.synchronize()
        start = time.time()
        out = conv2d(torch.autograd.Variable(data1))
        torch.cuda.synchronize()
        end = time.time()
        print(out.size())
        print("  --- %s ---  " %(end - start))

def time3D():
        conv3d = nn.Conv3d(256,256,(3,3,3)).cuda()
        torch.cuda.synchronize()
        start = time.time()
        out = conv3d(torch.autograd.Variable(data))
        torch.cuda.synchronize()
        end = time.time()
        print(out.size())
        print("  --- %s ---  " %(end - start))

print("Initializing cuda state")
time2D()
print("going to time2D")
time2D()
print("going to time3D")
time3D()
1 Like

@albanD
On giving more thought if I need to do a 3D convolution with (1 x Kx x Ky) kernel, reshaping the input for a 2D convolution would make sense right? I would probably save time and definitely save memory.

And similarly if I have kernel = 1 (used for changing the number of features/channels) then reshaping the input for 1D convolution is a better choice? (3D conv to 1D conv would save lot of memory, but dont know if reshaping would have some adverse affects in forward or backward pass)

What do you think???

The algorithms used for 1D, 2D and 3D convolution might be slightly different (especially if you use cudnn) so I am not sure you can predict the runtime and memory footprint without trying it.
The reshape is not a problem with respect to the autograd, but it is still one extra operation (even though it is a really cheap one).

If you just want it to work, you can just use the simplest one.
If you really need the last few (potential) percent of performance/memory, then you can benchmark each approach for your input sizes.

As I have seen roughly Conv3d memory consumption is very high. A 3D residual module (Residual module with kernels inflated to 3D) uses double memory compared to a Residual module (will present exact figures after checking).
I have been facing memory shortag while working on my model consisting of 3D CNN’s, and if this helps it would be really great!
Will post my results here

Thanks!

Just for anyone curious didnt really help much!!