Does non_blocking=True use my stream or a new one?

I’m using stream context manager (with torch.cuda.stream(s)) to write my custom autograd function. If I do some data transfer from host to device using to(device=d, non_blocking=True) or the copy_ function (again with non_blocking=True), will the data be transferred using my stream s or will PyTorch create yet another stream to transfer the data? So far, my results have been correct so I’m assuming that it’s the former.

The kernel will use the current stream, so your custom stream assuming you are using the to() or copy_ op in the context manager based on these lines of code.

1 Like