What would be faster? I thought I might be able to pair matrices and parallelise the computation with batched matrix multiplications, but it was slower: https://gist.github.com/gngdb/70fce4f27cdaeeb3f8f18cf9929e60d3
Hi gngdb, the recursive function you are using has an overhead for concatenation and slicing. Hence, for reducing smaller lists, the naive reduce may be faster. However, for list lengths of 128, I got the following result.
functools: CPU: 2.2555802155067415 GPU: 3.1399718035379047 recursive: CPU: 10.128527292804142 GPU: 1.773943289081565
So your batching method scales better then the naive reduce.
Ah, that makes sense, thanks.
Looking at this again today, there’s actually a massive mistake. The test to check that
recursive_reduce match was wrong, it was just testing to see that
recursive_reduce were equal. Fixing that, I realised that the resulting matrix explodes, which makes the test a little difficult, so have to scale by sqrt of M to keep approximately unit variance.