i try to run this operation after the forward ,it spend more than 100ms.it’s strange as the forward only need 2ms.
i thought it’s because the gpu free memory take lot of time. I try to do that let it sleep one second after the forward ,it run fast.
boundingBoxesOfOne = torch::masked_select(boundingBoxesOfOne, mask).detach().cpu();
Is there any way to get him to run fast?
I will be very grateful for any help.