CPU usage Mac vs Linux

szaynoun · June 26, 2025, 8:31pm

I will preface with I am new to pytorch and parallel computing. I have a pytorch training script that uses optuna for parallel processing during hyper parameter search. I run it on a Mac Studio with M1 Ultra which has 20 CPUs while setting n_jobs to 20. I see that all CPUs have utilization above 95% when every jobs starts training.

Similarly, I run the same exact code on a Linux virtual machine with 20 CPUs. It is on a corporate compute grid. Running the script with n_jobs set to 20 is 2-3x slower than the Mac run. I ran htop to check CPU usage and I see it is generally low, 30-40%. How can I debug this to find the bottle neck?

The discrepancy between the two runs is strange. I am using same python and package versions etc.

Happy to share more info as needed.

Thank you for your help in advance.

Hamza_Javaid · June 29, 2025, 11:25am

your Linux VM is running on a corporate compute grid, which likely means:

The VM’s 20 “CPUs” might be virtual CPUs (vCPUs) rather than dedicated physical cores
These vCPUs could be spread across multiple physical hosts or NUMA nodes
The VM might be competing with other VMs for physical resources
There could be additional virtualization overhead

You can try any of these
Verify CPU Architecture and Configuration

import os
import psutil
import platform
import subprocess

def diagnose_cpu_configuration():
    """Comprehensive CPU configuration diagnosis"""
    
    print("=== System Information ===")
    print(f"Platform: {platform.platform()}")
    print(f"Processor: {platform.processor()}")
    print(f"Python version: {platform.python_version()}")
    
    print("\n=== CPU Configuration ===")
    print(f"Physical cores: {psutil.cpu_count(logical=False)}")
    print(f"Logical cores: {psutil.cpu_count(logical=True)}")
    
    # Check CPU affinity - which cores can the process actually use
    try:
        affinity = os.sched_getaffinity(0)
        print(f"CPU affinity (cores available to this process): {sorted(affinity)}")
        print(f"Number of cores available: {len(affinity)}")
    except AttributeError:
        print("CPU affinity check not available on this system")
    
    # Check for CPU frequency scaling
    print("\n=== CPU Frequency ===")
    freq = psutil.cpu_freq()
    if freq:
        print(f"Current: {freq.current:.2f} MHz")
        print(f"Min: {freq.min:.2f} MHz")
        print(f"Max: {freq.max:.2f} MHz")
    
    # Check for NUMA configuration on Linux
    if platform.system() == "Linux":
        try:
            numa_info = subprocess.check_output("numactl --hardware", shell=True, text=True)
            print("\n=== NUMA Configuration ===")
            print(numa_info)
        except:
            print("NUMA information not available")
    
    # Check for CPU throttling or power management
    if platform.system() == "Linux":
        try:
            governor = subprocess.check_output(
                "cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor", 
                shell=True, text=True
            ).strip()
            print(f"\nCPU Governor: {governor}")
        except:
            pass

diagnose_cpu_configuration()

or you can monitor resource utilization during training. Most importantly check CUDA, Numpy configs on both machines

Joy_Walim · June 30, 2025, 1:43pm

Linux usually has lower CPU usage than Mac because it runs fewer background services