CPU usage Mac vs Linux

I will preface with I am new to pytorch and parallel computing. I have a pytorch training script that uses optuna for parallel processing during hyper parameter search. I run it on a Mac Studio with M1 Ultra which has 20 CPUs while setting n_jobs to 20. I see that all CPUs have utilization above 95% when every jobs starts training.

Similarly, I run the same exact code on a Linux virtual machine with 20 CPUs. It is on a corporate compute grid. Running the script with n_jobs set to 20 is 2-3x slower than the Mac run. I ran htop to check CPU usage and I see it is generally low, 30-40%. How can I debug this to find the bottle neck?

The discrepancy between the two runs is strange. I am using same python and package versions etc.

Happy to share more info as needed.

Thank you for your help in advance.

your Linux VM is running on a corporate compute grid, which likely means:

  • The VM’s 20 “CPUs” might be virtual CPUs (vCPUs) rather than dedicated physical cores
  • These vCPUs could be spread across multiple physical hosts or NUMA nodes
  • The VM might be competing with other VMs for physical resources
  • There could be additional virtualization overhead

You can try any of these
Verify CPU Architecture and Configuration

import os
import psutil
import platform
import subprocess

def diagnose_cpu_configuration():
    """Comprehensive CPU configuration diagnosis"""
    
    print("=== System Information ===")
    print(f"Platform: {platform.platform()}")
    print(f"Processor: {platform.processor()}")
    print(f"Python version: {platform.python_version()}")
    
    print("\n=== CPU Configuration ===")
    print(f"Physical cores: {psutil.cpu_count(logical=False)}")
    print(f"Logical cores: {psutil.cpu_count(logical=True)}")
    
    # Check CPU affinity - which cores can the process actually use
    try:
        affinity = os.sched_getaffinity(0)
        print(f"CPU affinity (cores available to this process): {sorted(affinity)}")
        print(f"Number of cores available: {len(affinity)}")
    except AttributeError:
        print("CPU affinity check not available on this system")
    
    # Check for CPU frequency scaling
    print("\n=== CPU Frequency ===")
    freq = psutil.cpu_freq()
    if freq:
        print(f"Current: {freq.current:.2f} MHz")
        print(f"Min: {freq.min:.2f} MHz")
        print(f"Max: {freq.max:.2f} MHz")
    
    # Check for NUMA configuration on Linux
    if platform.system() == "Linux":
        try:
            numa_info = subprocess.check_output("numactl --hardware", shell=True, text=True)
            print("\n=== NUMA Configuration ===")
            print(numa_info)
        except:
            print("NUMA information not available")
    
    # Check for CPU throttling or power management
    if platform.system() == "Linux":
        try:
            governor = subprocess.check_output(
                "cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor", 
                shell=True, text=True
            ).strip()
            print(f"\nCPU Governor: {governor}")
        except:
            pass

diagnose_cpu_configuration()

or you can monitor resource utilization during training. Most importantly check CUDA, Numpy configs on both machines :wink:

Linux usually has lower CPU usage than Mac because it runs fewer background services