Does non_blocking=True use my stream or a new one?

rinkujadhav2013 · January 4, 2023, 4:39pm

I’m using stream context manager (with torch.cuda.stream(s)) to write my custom autograd function. If I do some data transfer from host to device using to(device=d, non_blocking=True) or the copy_ function (again with non_blocking=True), will the data be transferred using my stream s or will PyTorch create yet another stream to transfer the data? So far, my results have been correct so I’m assuming that it’s the former.

ptrblck · January 4, 2023, 9:44pm

The kernel will use the current stream, so your custom stream assuming you are using the to() or copy_ op in the context manager based on these lines of code.