[distributed] Use different backends for CPU and GPU tensors

I want to use nccl backend for tensors on GPU, and tcp backend for tensors on CPU. Is there a possible way to work it out?

PS. gloo backend is out of the question.