Does tensors on gpu perform non-arithmetic operations faster than tensors on cpu?

As we all know, gpu can accelerate arithmetic operations of tensors, such as + - * /.

In that way, do tensors on gpu perform non-arithmetic operations faster than tensors on cpu?

For example, I want to change all elements of the tensor which are smaller than 10000 to 123:

a = torch.tensor([i for i in range(100000)])
a[(a<10000).nonzero(as_tuple=True)] = 123 # operation 1
a = torch.tensor([i for i in range(100000)]).cuda()
a[(a<10000).nonzero(as_tuple=True)] = 123 # operation 2

Is operation 2 necessarily faster than operation 1?

Would anyone like to share your opinion?