Correct order of BatchNorm1d and AvgPool1d

I experience performance differences when alternating the order of BatchNorm1d and AvgPool1d

What is the correct mathematically efficient order?
Why should there be a difference?


In the batchNorm you compute a variance, so it’s O(N2) with the total number of elements, while the avgPool is just O(N). So I think you would improve the performance by doing the avgPool first (that reduces the number of elements), and then the batchNorm.

But that’s just a feeling, not a precise explanation.

Thanks for the information, I will test it to verify your assertion.

1 Like