Why CUDACachingAllocator limited block shareing inside stream?

  bool get_free_block(AllocParams& p) {
    BlockPool& pool = *p.pool;
    auto it = pool.lower_bound(&p.search_key);
    if (it == pool.end() || (*it)->stream != p.stream())
      return false;
    p.block = *it;
    return true;

If block is allocated by one stream, and splited to one free block, why can’t we use it in aother stream?
When CudaMalloc is host blocking.

Anyone has idea on it?