`data_ptr` after `to` corrupts first 128 bits of CUDA tensor?

May be related: Can`t obtain right data from tensor.contiguous().data_ptr()