Strange time Consumption after forward with c++

hfengzhi · April 16, 2019, 9:16am

i try to run this operation after the forward ,it spend more than 100ms.it’s strange as the forward only need 2ms.
i thought it’s because the gpu free memory take lot of time. I try to do that let it sleep one second after the forward ,it run fast.

boundingBoxesOfOne = torch::masked_select(boundingBoxesOfOne, mask).detach().cpu();

Is there any way to get him to run fast?
I will be very grateful for any help.

yf225 · July 23, 2019, 1:40am

When you measure the runtime of forward, did you use cudaStreamSynchronize(...) (which is equivalent to torch.cuda.synchronize() in Python)? This can affect runtime measurement because CUDA is by default asynchronous.

narurajun · April 27, 2021, 4:20am

Hello, I am experiencing the same issue, the operations takes 45ms for me, were you able to resolve it?