For example, the custom ReLU defined in https://pytorch.org/tutorials/beginner/examples_autograd/two_layer_net_custom_function.html
is much slower than nn.ReLU
.
Why is that? Can it be remedied?
For example, the custom ReLU defined in https://pytorch.org/tutorials/beginner/examples_autograd/two_layer_net_custom_function.html
is much slower than nn.ReLU
.
Why is that? Can it be remedied?
Hi,
It depends what you mean by “much slower”. It runs in python while the nn.ReLU
version is implemented completely in c++ so it will be a bit faster.
Do you have benchmark that show surprising results?
they also interfere with multi-threaded backward, don’t they?
Well “multi-threaded backward” is hard to get right now anyways (very unlikely that you’re using it unless you’re trying very hard)
But they will acquire the GIL yes.