Why does Winograd algorithm speedup convolution, given that MUL and ADD cost the same clock cycles on GPU?

Hi zfzhang!

These two papers you cite:

https://arxiv.org/abs/1509.09308
and
http://cs231n.stanford.edu/reports/2016/pdfs/117_Report.pdf

refer to a “minimal filtering algorithm” published by Winograd in 1980.

The Wikipedia link you give for the “Winograd algorithm” in your original
post:

is for the Strassen-like Coppersmith-Winograd matrix-multiplication
algorithm published by them in 1990. Same Winograd, two different
algorithms.

I’m willing to believe that Winograd’s minimal-filtering algorithm can
be used in practice to speed up CNNs.

Best.

K. Frank