Synchronization slow down caused by .item() which is not caused by .data[0]

Yes, the GPU operation can be performed in the background while your python script continues its execution.
Once you get to a point where you push your GPU op result to CPU or print it, the script has to wait for the GPU so a synch point will be added automatically.
Timing is therefore a bit complicated, because it’s often not showing the true GPU op times.