Hi, Thank you for the tag!
I vaguely remember that using non_blocking=True
from GPU to CPU might be dangerous (Should we set non_blocking to True? - #18 by sbelharbi), so we only use when copying from CPU to GPU.
For reference, this is the PR that introduced the feature: Add default hooks to save tensors on CPU by Varal7 · Pull Request #61928 · pytorch/pytorch · GitHub