Moving tensor to cuda

@ptrblck , but what if I don’t touch CUDA_LAUNCH_BLOCKING but set non_blocking argument to False?