Has anyone used Ray for distributed learning?

Does it make sense to use Ray instead of torch.multiprocessing, if used only on a single computer?
Has anyone used it for multiple clusters?

What are the advantages/disadvantages?
Anything to be cautious of?

Thank you