Why are custom autograd functions so slow? Is there any way to speed them up?

For example, the custom ReLU defined in https://pytorch.org/tutorials/beginner/examples_autograd/two_layer_net_custom_function.html
is much slower than nn.ReLU.

Why is that? Can it be remedied?


It depends what you mean by “much slower”. It runs in python while the nn.ReLU version is implemented completely in c++ so it will be a bit faster.
Do you have benchmark that show surprising results?

they also interfere with multi-threaded backward, don’t they?

Well “multi-threaded backward” is hard to get right now anyways (very unlikely that you’re using it unless you’re trying very hard) :wink:
But they will acquire the GIL yes.

1 Like