Some questions about malloc Block in torch

In Block* malloc function, when request size is less than 1MB, it will get block from small_blocks(BlockPool) in get_pool func. However, get_allocation_size will return 2MB.
So here why get_allocation_size function needs to change size less than 1M to 2M, change size less than 10M to 20M?

Block* malloc(int device, size_t orig_size, cudaStream_t stream) {
...
    size_t size = round_size(orig_size); 
    auto& pool = get_pool(size, stream);
    const size_t alloc_size = get_allocation_size(size);
    AllocParams params(device, size, stream, &pool, alloc_size, stats);

    bool block_found =
        // Search pool
        get_free_block(params)
        // Trigger callbacks and retry search
        || (trigger_free_memory_callbacks(params) && get_free_block(params));
...
}

The caching allocator uses a small and large pool creating blocks which are hopefully reusable. Creating custom allocations for each tensor with the exact size might limit the ability to reuse it and thus potentially increasing memory fragmentation. You can adapt the sizes via an env variable if needed.

Thanks for your explanation, it helps a lot.