Pytorch inference slow on cpu when i modified torchvision.models.mobilenet_v2

Hi,

I’m trying to do the inference on cpu with torchvision.models.mobilenet_v2
If I load a pretrained mobilenet_v2, get rid of some layers of it and retrain, the inference time is very slow when I use my new weights. Here is the code:

device = "cpu"
parameters = "xxx.pth"
model = models.mobilenet_v2(pretrained=False)
model.features = nn.Sequential(*[model.features[i] for i in range(15)])
model.classifier = nn.Linear(160, 2, bias=True)
params = torch.load(parameters, map_location=device)
model.load_state_dict(params['state_dict'], strict=True)
model.to(device)
model.eval()

test_pic = os.listdir("xxx/test_pic")
for img_name in test_pic:
    img_path = os.path.join("xxx/test_pic", img_name)
    img = cv2.imread(img_path)
    with torch.no_grad():
        img = img.astype(np.float32)
        img = img[:, :, (2, 1, 0)]
        img = np.transpose(img, (2, 0, 1))
        img = torch.tensor(img)
        img = img.unsqueeze(dim=0)
        img = img.to(device)
        st = time.time()
        out = model(img)
        print(time.time() - st)

In this case, the time cost is about 1s per inference

And if I don’t change anything to the mobilenet, here is the code:

device = "cpu"
model = models.mobilenet_v2(pretrained=True)
model.to(device)
model.eval()

test_pic = os.listdir("xxx/test_pic")
for img_name in test_pic:
    img_path = os.path.join("xxx/test_pic", img_name)
    img = cv2.imread(img_path)
    with torch.no_grad():
        img = img.astype(np.float32)
        img = img[:, :, (2, 1, 0)]
        img = np.transpose(img, (2, 0, 1))
        img = torch.tensor(img)
        img = img.unsqueeze(dim=0)
        img = img.to(device)
        st = time.time()
        out = model(img)
        print(time.time() - st)

The time cost reduced to 150ms per inference.

I’ve checked my model is 2.9MB and the official pretrained model is 13.55MB.

Is it normal? Is there something wrong with my model? Any ideas will be appreciated.

thanks

Depending on your CPU you might be able to use torch.set_flush_denormal(True) to avoid a slower code path for denormal values (if that’s causing the slowdown, which could be the case based on your description).

1 Like