If your PyTorch process runs out of memory after hours of switching between models, the problem isn't PyTorch — it's glibc's malloc arena allocator

brjen · March 29, 2026, 3:46am

The problem

When PyTorch loads a model, glibc allocates large memory arenas via malloc. When you unload with del model + gc.collect() + torch.cuda.empty_cache(), Python releases its references — but glibc keeps the arenas because small residual allocations pin entire chunks. Memory grows with every model switch and is never returned to the OS.

This affects anyone running long-lived inference servers, Gradio apps, ComfyUI, or any pipeline that loads/unloads multiple models.

The fix

export MALLOC_MMAP_THRESHOLD_=65536
export MALLOC_TRIM_THRESHOLD_=65536

Set these before launching Python. Forces allocations >64KB to use mmap() instead of arenas. mmap pages are returned to the OS immediately on free — no fragmentation.

Proof

Before: RSS grew ~3GB per model switch, OOM after 17 hours and 107 switches
After: RSS flat at 955MB across 107 consecutive switches between 13 different checkpoints (SDXL, Flux, PixArt, SD 1.5, Playground v2.5)
Tested with diffusers/FastAPI on an AMD RX 7800 XT (ROCm) and NVIDIA GTX 1080 Ti (CUDA)

No code changes. No hook removal. No gc hacks. Just two environment variables.

Full write-up with methodology and data: GitHub - brjen/pytorch-memory-fix: Two environment variables that fix PyTorch/glibc memory creep on Linux. Zero code changes. Zero performance cost. · GitHub