Or is cuda code most of the time faster?
I have seen quite a couple of repos writing backward themselves using cuda code and I have not yet have time to test which one is faster.
Hopefully autograd is highly optimized that I do not have to worry about writing cuda