Is there any way to make pinned CPU tensors released back to the OS immediately?

The pinned CPU tensors can’t be released back to the OS immediately.

import torch
import gc
import ctypes
import psutil
import os

def get_memory_usage():
    """Return current process RSS memory usage in MB."""
    process = psutil.Process(os.getpid())
    return process.memory_info().rss / (1024 * 1024)

def trim_memory():
    """Attempt to release unused memory back to the OS using malloc_trim."""
    libc = ctypes.CDLL("libc.so.6")
    libc.malloc_trim(0)

# Initial memory usage
print(f"[Before allocation] Memory usage: {get_memory_usage():.2f} MB")

# Allocate 1 GiB of pinned memory on CPU
x = torch.empty(1024 * 1024 * 1024, dtype=torch.uint8, device="cpu", pin_memory=True)
print(f"[After allocation] Memory usage: {get_memory_usage():.2f} MB")

# Delete the tensor
del x

# Clear cuda cache
torch.cuda.empty_cache()

# Run garbage collection
gc.collect()

# Try to trim memory
trim_memory()

print(f"[After del + gc + malloc_trim] Memory usage: {get_memory_usage():.2f} MB")

I believe torch.cuda.empty_cache is for the CUDA caching allocator, but pinned host memory would be allocated through the host caching allocator.
Can you try with torch._C._host_emptyCache() instead of torch.cuda.empty_cache() ?
Maybe you can raise an issue to create a public python API for it, since there seems to be reasonable use cases for it.