To conv2d or to conv3d, that is the question?

I was thinking, if i have an input of 128x128x128, this can be represented as 128 128x128 2d arrays or a single 128x128x128 3D array, so which is most effiecent in terms of memory usage and speed on a single GPU?

You could profile both workloads using e.g. torch.utils.benchmark in the nightly release or profile it manually. I guess the 2D layer would be faster, but ymmv depending on the used hardware and system setup.

Hi Ptrblck

I assumed that a 2d would be quicker, but then i got thinking about the impact of the number of channels. I you use 3D, then you would start with 1 and increase to ‘x’, however if you use 2D, then you would start with 128 and have to increase to a suitable number, thus making the number of channels become very large, very quickly, unless you adopted the inception net approach…

This then made me wonder, what is going to affect the efficency of the network more, the extra dimension or the number of channels.

I’ll have a look at running some benchmark tests and get back to you…