Pytorch with CUDA Unified Memory

Sure, you can check this hackathon entry from this summer: Introducing SpeedTorch: 4x speed CPU->GPU transfer, 110x GPU->CPU transfer
Note that the copy is faster as no copy happens here.
Any actual operation on these Tensors will be significantly slower though as copy will happen and such slow downs are not measures in the benchmark of the submission.