Time difference between models forward method and model(input) call

I have network with a rather complicated forward pass that includes several for-loops and I accounted that for the 7 seconds one forward pass needed to complete. These 7 seconds where measured with:

start = time.process_time()
output = network(input_batch)
end = time.process_time()

However if I take the time inside the forward method of the model (start before the first statement and end just before the return) it results in a time of around 0.03 seconds.

Where does this difference come from and can I do something to reduce it?

If you are executing the forward pass on the GPU, you should add torch.cuda.synchronize before starting and stopping the timer, as CUDA operations are executed asynchronously.
Currently you might time the kernel launch times or some other operations, which create a synchronization point.

I’m currently only using CPU, as I didn’t get my custom component to efficiently run on the GPU yet.

That’s interesting. Could you post an executable code snippet so that we could have a look?

I actually made a stupid mistake in my time tracking in the forward method. Now everything works properly and the times match up.

Thanks anyway for the help and the GPU tip!