Torch compile on cpu slow

Hi,

With pytorch 2.2.0 (but the issue was the same with version 2.1.2) on cpu, torch.compile makes my code sometimes 2x slower. Here are my timings on a toy regression task:

wo/ torch.compile

Epoch 0: Loss: 5.15e-01 in 1.3s.
Epoch 1: Loss: 1.91e+01 in 1.3s.
Epoch 2: Loss: 6.26e+03 in 1.3s.
Epoch 3: Loss: 3.59e+04 in 1.4s.
Epoch 4: Loss: 7.14e+04 in 1.3s.
Epoch 5: Loss: 9.38e+05 in 1.3s.
Epoch 6: Loss: 1.49e+02 in 1.3s.
Epoch 7: Loss: 1.01e+00 in 1.3s.
Epoch 8: Loss: 8.76e+00 in 1.3s.
Epoch 9: Loss: 1.25e+00 in 1.3s.
Epoch 10: Loss: 1.39e+00 in 1.3s.
Epoch 11: Loss: 3.83e-01 in 1.3s.
Epoch 12: Loss: 5.26e-01 in 1.3s.
Epoch 13: Loss: 5.29e-01 in 1.3s.
Epoch 14: Loss: 5.71e-01 in 1.3s.
Epoch 15: Loss: 5.71e-01 in 1.3s.
Epoch 16: Loss: 5.55e-01 in 1.3s.
Epoch 17: Loss: 5.40e-01 in 1.4s.
Epoch 18: Loss: 5.25e-01 in 1.4s.
Epoch 19: Loss: 6.16e-01 in 1.4s.
Epoch 20: Loss: 1.37e+00 in 1.4s.

w/ torch.compile

Epoch 0: Loss: 5.15e-01 in 5.6s.
Epoch 1: Loss: 1.91e+01 in 1.3s.
Epoch 2: Loss: 6.26e+03 in 3.5s.
Epoch 3: Loss: 3.59e+04 in 2.4s.
Epoch 4: Loss: 7.14e+04 in 1.8s.
Epoch 5: Loss: 9.37e+05 in 3.6s.
Epoch 6: Loss: 1.49e+02 in 1.3s.
Epoch 7: Loss: 1.00e+00 in 1.3s.
Epoch 8: Loss: 8.85e+00 in 1.3s.
Epoch 9: Loss: 1.27e+00 in 1.3s.
Epoch 10: Loss: 1.39e+00 in 1.3s.
Epoch 11: Loss: 3.79e-01 in 1.4s.
Epoch 12: Loss: 5.18e-01 in 1.5s.
Epoch 13: Loss: 5.27e-01 in 1.5s.
Epoch 14: Loss: 5.69e-01 in 2.3s.
Epoch 15: Loss: 5.68e-01 in 4.5s.
Epoch 16: Loss: 5.53e-01 in 6.1s.
Epoch 17: Loss: 5.42e-01 in 5.6s.
Epoch 18: Loss: 5.36e-01 in 5.4s.
Epoch 19: Loss: 7.87e-01 in 4.8s.
Epoch 20: Loss: 1.98e+00 in 4.6s.

Those results can be reproduced with the following code:

import torch
import torch.nn as nn
import numpy as np
from time import time
torch.set_num_threads(2)

input_size = 1
output_size = 1
hidden_size = 1024
num_data = 10000

# seeds
seed = 1234
torch.manual_seed(seed)
np.random.seed(seed)

# hyper-parameters
num_epochs = 50
learning_rate = 0.01

# toy dataset
x_train = np.random.rand(num_data,input_size)
y_train = np.cos(2*np.pi*x_train) + 0.1*np.random.randn(num_data,input_size)

# regression model
model = nn.Sequential(nn.Linear(input_size, hidden_size),
                      nn.GELU(),
                      nn.Linear(hidden_size, hidden_size),
                      nn.GELU(),
                      nn.Linear(hidden_size, hidden_size),
                      nn.GELU(),
                      nn.Linear(hidden_size, hidden_size),
                      nn.GELU(),
                      nn.Linear(hidden_size, output_size))

# loss and optimizer
criterion = nn.MSELoss()
optimizer = torch.optim.NAdam(model.parameters(), lr=learning_rate)
if 1:
    model = torch.compile(model, mode="reduce-overhead", fullgraph=True)
    
# train the model
x_train = torch.from_numpy(x_train.astype(np.float32))
y_train = torch.from_numpy(y_train.astype(np.float32))

for epoch in range(num_epochs):
    start_time = time()
    
    # forward pass
    outputs = model(x_train)
    loss = criterion(outputs, y_train)
    
    # backward and optimize
    optimizer.zero_grad(set_to_none=True)
    loss.backward()
    optimizer.step()
    
    print(f'Epoch {epoch}: Loss: {loss.item():.2e} in {time()-start_time:.1f}s.')

Is my usage of torch.compile correct ? Are there some options I could try to avoid this issue ?
Thanks for the help.

I cannot reproduce the issue on my system:

# with compile
Epoch 0: Loss: 5.15e-01 in 1.3s.
Epoch 1: Loss: 1.91e+01 in 0.9s.
Epoch 2: Loss: 6.26e+03 in 0.9s.
Epoch 3: Loss: 3.59e+04 in 0.9s.
Epoch 4: Loss: 7.14e+04 in 0.9s.
Epoch 5: Loss: 9.37e+05 in 0.9s.
Epoch 6: Loss: 1.49e+02 in 0.9s.
Epoch 7: Loss: 9.99e-01 in 0.9s.
Epoch 8: Loss: 8.85e+00 in 0.9s.
Epoch 9: Loss: 1.28e+00 in 0.9s.

# without compile
Epoch 0: Loss: 5.15e-01 in 0.9s.
Epoch 1: Loss: 1.91e+01 in 0.9s.
Epoch 2: Loss: 6.26e+03 in 0.9s.
Epoch 3: Loss: 3.59e+04 in 0.9s.
Epoch 4: Loss: 7.14e+04 in 0.9s.
Epoch 5: Loss: 9.38e+05 in 0.9s.
Epoch 6: Loss: 1.49e+02 in 0.9s.
Epoch 7: Loss: 1.01e+00 in 0.9s.
Epoch 8: Loss: 8.76e+00 in 0.9s.
Epoch 9: Loss: 1.25e+00 in 0.9s.

CC @marksaroufim in case you’ve seen a similar regression.

I can repro this behavior on google colab, which CPU are you using @Laurent1? I’d imagine this to be more of an issue on lower end CPUs

This is Intel(R) Core™ i7-4700MQ CPU @ 2.40GHz
I am glad you can reproduce this issue.