Neural Networks performance question

Hello everyone,

First of all, a bit of a presentation: I am new to the world of Artificial Neural Networks, but I am an HPC specialist (from OpenMP and MPI to CUDA and OpenCL. Yes, I speak Fortran).

I am currently trying to understand the needs of the data analysts/Machine Learning community in terms of a computer science-centric approach.

Basically, at the moment, I do see that a lot of the codes are using 2d-convolution networks, which rely heavily on dense matrix-matrix operations (GEMM). Obviously, these work extremely well for cache-based processors, and even better on GPU’s.

So my question could be summarized as a chicken and egg conundrum: do data scientists use 2D-convolutions at the core of the networks because the hardware we have is very fast for that, or are there no other ways to solve these problems?

Formulated differently: as a data scientist, are there cases that you cannot efficiently resolve with the current “gemm-heavy” approach, for instance with sparse tensors, or very large problems that are just incredibly slow even with the latest shiny GPU, etc…

Thank you by advance for your help! I find this domain truly fascinating.


Group convolutions with a large number of groups, dilated convolutions, strided convolutions – all these are not used as much as they should be because they bring sparsity and cache-unfriendly memory accesses into play.

See for reference of dilated, strided.

Thank you very much for your insight, this is very instructive!