Non blocking copy from CPU to GPU

Hi,
I need a little help.
I know there is copy_() function

Tensor.copy_(src, non_blocking=False ) → [Tensor]
Copies the elements from src into self tensor and returns self

so this is function to copy tensor to specific allocation in the memory
I want to know if there is such a function to copy the whole module to specific allocation at once, (not each tensor a lone) in non blocking way.

function like .to() - that allows to transfer the whole module.
but to specific allocation (which this is not exsists in .to() function)

Thanks:)

:pray:
Any-one did such a copy?