Does .cpu() use the 'CurrentStream'?

Hi,
A) Does the gpu2cpu copy(with pinned memory) run in Stream s2 or the default stream(stream 0)?

line1 with torch.cuda.stream(s2):       
line2         s2.wait_stream(torch.cuda.current_stream())
line3         Tensor.to(device=cpu)    # bocked here
        .......

B)

line1 with torch.cuda.stream(s2):       
line2         s2.wait_stream(torch.cuda.current_stream())
line3         Tensor = ...    
line4         Tensor.to(device="cpu",non_blocking=True)      <-
line5         TensorB=Tensor...      

Does Tensor.to use Stream s2 in this case? Shall we insert stream synchronization before or after line4?