You have to add torch.cuda.synchronize()
to your benchmark, since the GPU operations are executed asynchronously (see here).
Your model is probably not finished, so that the transfer of output
has to wait for it.
You have to add torch.cuda.synchronize()
to your benchmark, since the GPU operations are executed asynchronously (see here).
Your model is probably not finished, so that the transfer of output
has to wait for it.