This graph is not the first execution of the network, it is one of many inferences that follow. Thank you for your answer, I probably understand the extra cost of the first execution of the model. There is one more question I want to understand. According to your statement, the first execution of the model after it is loaded into the GPU will inevitably take time to perform some initialization operations. If it is unavoidable, can this time be shortened in some way.