I am trying to measure the runtime of the forward pass of ResNet-50 during training and during evaluation. I am running experiments on a 64-core CPU and no GPU is used. Here is the code I used.
import argparse
import time
import torch
import torch.optim as optim
import torchvision
import torchvision.datasets as datasets
import torchvision.transforms as transforms
parser = argparse.ArgumentParser()
parser.add_argument(
"--batch_size",
type=int,
default=128,
help="Batch size.",
)
parser.add_argument("--num_data", default=1024, type=int, help="Number of fake images.")
args = parser.parse_args()
train_dataset = datasets.FakeData(
args.num_data, (3, 224, 224), 1000, transforms.ToTensor()
)
train_loader = torch.utils.data.DataLoader(
train_dataset, batch_size=args.batch_size, num_workers=1, pin_memory=True
)
test_dataset = datasets.FakeData(
args.num_data, (3, 224, 224), 1000, transforms.ToTensor()
)
test_loader = torch.utils.data.DataLoader(
test_dataset, batch_size=args.batch_size, num_workers=1, pin_memory=True
)
model = torchvision.models.resnet50(pretrained=True)
optimizer = optim.SGD(model.parameters(), lr=0.01)
criterion = torch.nn.CrossEntropyLoss()
print("==================== Training ====================")
model.train()
for i, (images, target) in enumerate(train_loader):
optimizer.zero_grad()
start = time.time()
outputs = model(images)
end = time.time()
print(f"Train forward time: {(end - start) * 1000.0} ms")
loss = criterion(outputs, target)
loss.backward()
optimizer.step()
print("==================== Evaluation ====================")
model.eval()
for i, (images, target) in enumerate(test_loader):
with torch.no_grad():
start = time.time()
outputs = model(images)
end = time.time()
print(f"Eval forward time: {(end - start) * 1000.0} ms")
I noticed that if I do training first then evaluation, then a forward pass during evaluation is slightly faster than a forward pass during training, which is expected. The runtimes are shown below.
==================== Training ====================
Train forward time: 2100.9457111358643 ms
Train forward time: 1974.3893146514893 ms
Train forward time: 1945.4665184020996 ms
Train forward time: 1943.62211227417 ms
Train forward time: 1888.5083198547363 ms
Train forward time: 1859.039068222046 ms
Train forward time: 1811.948537826538 ms
Train forward time: 1805.4358959197998 ms
==================== Evaluation ====================
Eval forward time: 2370.067834854126 ms
Eval forward time: 2061.3937377929688 ms
Eval forward time: 1844.5143699645996 ms
Eval forward time: 1753.0148029327393 ms
Eval forward time: 1701.3907432556152 ms
Eval forward time: 1688.025712966919 ms
Eval forward time: 1813.1353855133057 ms
Eval forward time: 1647.9554176330566 ms
However, if I do evaluation only, by commenting out the training code block, then a forward pass during evaluation will be significantly slower, and even slower than a forward pass during training.
==================== Evaluation ====================
Eval forward time: 2793.8458919525146 ms
Eval forward time: 2747.8232383728027 ms
Eval forward time: 2875.753164291382 ms
Eval forward time: 2738.5916709899902 ms
Eval forward time: 2732.877016067505 ms
Eval forward time: 2838.664770126343 ms
Eval forward time: 2783.207893371582 ms
Eval forward time: 2791.349411010742 ms
It looks like doing training first could speed up the forward pass during evaluation. How is this possible?