MEC (Memory-efficient convolution)

Nako_Sung · August 9, 2017, 1:02pm

In ICML 2017 an interesting paper was introduced. Memory efficient convolution MEC suggests a new way to compute convolution with much lesser memory and much faster performance.

I think this is worth to implement in PyTorch.

leonb · August 9, 2017, 3:35pm

Amusingly enough, Maxime Oquab and I used a MEC-style convolutions in 2014 to implement our CVPR 2015 paper on weakly supervision (see htttp://www.di.ens.fr/willow/research/weakcnn). We started with a Lua prototype of the lowering code (including dealing with padding, etc) and we ported to GPU. This is what allowed us to process large images instead of being limited to 224x224 patches. The prototype is still at https://github.com/leonbottou/torch7-custom/blob/a5c15708cc1a5e46f966dd27d87101989b8cab65/extra/nxn/Prototype-Of-Convolution.lua and the GPU version is at https://github.com/leonbottou/torch7-custom/blob/master/extra/cuda/pkg/cunxn/SpatialConvolution.cu .

Empirically this code was most efficient with convolutions involving a large number of planes. But a couple months later, nvidia released CUDNN, and we did not think our code was any faster. In fact the comparison with CUDNN is missing from the MEC paper because it is not open source. This is unfortunate.

leonb · August 9, 2017, 3:49pm

On the other hand, they use cublasSgemmBatched and claim that this helps a lot. We talked about it but I do not remember if we tried. On the other hand, we tried to call cublasSgemm on multiple streams, but that did not help on our combination of CUDA and GPU. Things may have improved…