Data to device is quite slow

I run the same code using two different device rtx 2080ti and titan xp.
The time cost of data.to(device) using rtx 2080ti is 0.1935, but using titan xp is 1.0735. Why such huge difference?

Dose this caused by the pci-e? The RTX 2080 Ti work in PCI-E x16, but the Titan Xp work in PCI-E x8. When the data batch is small, there is no obvious difference.