State of afairs for development w/ remote GPUs as of 2024

matheusd · January 20, 2024, 11:33am

Hello,

I’m a PyTorch newbie and I’m trying to assess whether it’s possible/feasible/reliable enough to develop with a remote GPU in 2024.

The latest information I have found so far is the following post in the forum, which would seem to imply it is: Ease development by running computations on remote GPU - #7 by wayi

However, I want to to highlight the following quotes:

Documentation on RemoteModule says RemoteModule is not currently supported when using CUDA tensors, but you said tensors will be automatically placed to the same cuda device. Am I missing something? If CUDA tensors are not supported now, where can I track progress on this?

Thanks for pointing this out! The doc is outdated. Actually CUDA tensors are now supported on TensorPipe backend, documented on the same page. I will update the doc soon.

Notably, the documentation page still has this warning, almost 3 years later.

And as of today (Jan/2024) it’s actually worse, because the TensorPipe backend repository is archived without any notices or otherwise hints about whether it’s still supported, if development has moved somewhere else or if it has been superceded.

The original issue (pytorch issues 33979) about supporting tensor pipe is open, and has had no updates since it’s been created.

I also tried eyeballing through the first few among the 300+ issues and 1k+ PRs with a reference to tensorpipe, but the hits on the search seem to be overwhelmingly about support issues related to compilation errors.

My question is: is running functions on remote CUDA devices possible, in 2024? Is there an updated, authoritative source or venue I can rely on to give me this information?