Why not use neon's backend?

According to Soumith’s benchmark https://github.com/soumith/convnet-benchmarks (which have not been updated recently), it seems that neon’s custom cuda backend is quite fast, nothing new here we already new that. So, my thought was why not utilize that backend to speed up pytorch since neon is open source project and and there is interoperability since the backend is c/c++. So my question is, are there any restrictions or valid reasons not trying to bring that backend speed to pytorch?


the benchmarks are old. cudnn should be at par with neon now. Also, data layout for neon is CHWB which means pytorch needs to do a transpose to use that layout.

Thanks for the response Soumith! Any plans on updating the benchmarks?

no plans, they are effectively dead. The space has converged to just cudnn becoming the best.
other good benchmarks are being developed that are more end-to-end, for example:


Thanks Soumith :slight_smile: Appreciate it!