How to profile backward time of ReLU layer

Literally, is there any way to profile backward time of relu layer automatically? (I mean by code itself, not by torch profiler)

I want to know elapsed time for backward computation at ReLU layer.
I’m using ReLU with in-place false.

but with just code like below

            # forward
            start = time.time()
            output = layer(input)
            torch.cuda.synchronize()
            end = time.time()
            total = end-start
            
            # backward
            dummy_grad = torch.rand(output.size()).cuda()
            start = time.time()
            output.backward(dummy_grad)
            torch.cuda.synchronize()
            end = time.time()
            total = end-start

I got an error like below

Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Thank you Guys for answering this Topic.

To call backward() on an output tensor a computation graph with trainable tensors must be created. Set .requires_grad_(True) on the input and it should work. Also, synchronize the GPU before starting the timers, too.

Oh, Thank you for answering this Topic and nice point for synchronizing before backward.

And your solution worked!
I made the first synthetic input with requires_grad=True option, then I got an expected result!.

Thank you again😊