Any reason using 2MB in CUDACachingAllocator?

devko · January 21, 2022, 3:54am

pytorch/pytorch/blob/7ee0712642492ef221a69d3fdf13b607f406bd78/c10/cuda/CUDACachingAllocator.cpp#L102

      
        
            */
            
            
namespace {
            
            
using stream_set = std::unordered_set<cuda::CUDAStream>;
            
            
constexpr size_t kMinBlockSize =
               512; // all sizes are rounded to at least 512 bytes
            constexpr size_t kSmallSize = 1048576; // largest "small" allocation is 1 MiB
            constexpr size_t kSmallBuffer =
               2097152; // "small" allocations are packed in 2 MiB blocks
            constexpr size_t kLargeBuffer =
               20971520; // "large" allocations may be packed in 20 MiB blocks
            constexpr size_t kMinLargeAlloc =
               10485760; // allocations between 1 and 10 MiB may use kLargeBuffer
            constexpr size_t kRoundLarge = 2097152; // round up large allocations to 2 MiB
            
            
typedef std::bitset<static_cast<size_t>(StatType::NUM_TYPES)> StatTypes;
            
            
void update_stat(Stat& stat, int64_t amount) {
             stat.current += amount;

When I check code, CUDACachingAllocator try to allocate GPU memory at best fit of 2MB size.

I find somethings like, “X86 support 4k, 2M, 4G page”, “CUDA driver need pinned memory for cudaMemcpy, So pytorch CPU allocator use cudaHostAlloc”…

So I think, 2MB is just good to manage between CPU and GPU.

Is there any more specific reason using 2MB size ?

Thanks