Torch.cdist operation using multiple GPUs

I want to do a pairwise distance computation on 2 feature matrices of sizes say n x f and n x f, and get an n x n matrix from this.

I am currently using torch.cdist() for this, and was wondering if there is any way to parallelize this across GPUs, something like how FAISS does - GitHub - facebookresearch/faiss: A library for efficient similarity search and clustering of dense vectors.?

Or do I need to write a custom implementation for this using torch multiprocessing?

Thanks for the help in advance!