It can be a problem and depends on the surrounding operations and in particular if the inplace relu
manipulated a tensor inplace which is needed in its original form for the gradient calculation.
This post gives a small example using an inplace div
operation.