Time difference between models forward method and model(input) call

Knievel · October 31, 2019, 9:57am

I have network with a rather complicated forward pass that includes several for-loops and I accounted that for the 7 seconds one forward pass needed to complete. These 7 seconds where measured with:

start = time.process_time()
output = network(input_batch)
end = time.process_time()

However if I take the time inside the forward method of the model (start before the first statement and end just before the return) it results in a time of around 0.03 seconds.

Where does this difference come from and can I do something to reduce it?

ptrblck · October 31, 2019, 10:10am

If you are executing the forward pass on the GPU, you should add torch.cuda.synchronize before starting and stopping the timer, as CUDA operations are executed asynchronously.
Currently you might time the kernel launch times or some other operations, which create a synchronization point.

Knievel · October 31, 2019, 10:24am

I’m currently only using CPU, as I didn’t get my custom component to efficiently run on the GPU yet.

ptrblck · October 31, 2019, 10:27am

That’s interesting. Could you post an executable code snippet so that we could have a look?

Knievel · October 31, 2019, 6:06pm

I actually made a stupid mistake in my time tracking in the forward method. Now everything works properly and the times match up.

Thanks anyway for the help and the GPU tip!