with torch.cuda.stream(s):
w = sample_indices.cpu().numpy()
q = action_probs.detach().cpu().numpy()
s.synchronize()
Later when I try to access w
I get an out of bounds error which does not happen if I do the .cpu()
transfers on the same stream. Looking at the value of w
, it is clearly corrupted. Is the transfer to host not being synchronized for some reason?