Different time cost with different weights on the same model


I found different time cost with different weights on the same mobilenetv3 model using PyTorch.

The following code is used to measure time:

transform_fn = transforms.Compose([
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225],
image = cv2.imread('test-image.jpg')
image = transform_fn(cv2.resize(image, (96,96), interpolation=cv2.INTER_LINEAR)).unsqueeze(0).cpu()

model.load_state_dict(torch.load('model_path', map_location='cpu'))

count = 100
with torch.no_grad():
    start_ts = time.time()
    for i in range(count):
        outputs = model(image)
    elapsed = (time.time() - start_ts) * 1000
    print('elapsed time:', elapsed / count)

I doubted if it is caused by the number of zero parameters in the model. Then I have counted the zero params in the different weights. But it turned out that the pretrained weights have less zero parameters.

Time: 8ms
Zero-params number (eps=1e-6): 541
Zero-params number (eps=1e-2): 114071

After 100 epoch based on Pretrained:
Time: 20ms
Zero-params number (eps=1e-6): 1538906
Zero-params number (eps=1e-2): 1619014

Zero parameters counting code:

eps = 1e-2
zero_cnt = 0
params = list(model.parameters())
for param in params:
    zeros_count = torch.sum(torch.where(torch.abs(param) < eps, torch.ones_like(param), torch.zeros_like(param))).int().item()
    zero_cnt += zeros_count
print('pretrained zero params:', zero_cnt)

Anybody know why this happens?


You might see a performance hit, if you are handling a lot of denormal values.
Set torch.set_flush_denormal(True) and run the code again.

Thanks. It’s caused by denormalized floats.

Both set_flush_denormal and setting all parameters close to zero to exactly zero can solve this problem.

  1. torch.set_flush_denormal(True)
  2. Set all parameters close to zero to exactly zero
for param in model.parameters():
    param.data = torch.where(torch.abs(param) < eps, torch.zeros_like(param), param)
1 Like

Is there anyway to prevent trained weights to have denormal values during the training time? we trained a model with ResNet50 as the backbone with PyTorch v1.0.0a0 and v1.4.0. Model trained with v1.4.0 has denormal value in trained weights and v1.0.0a0 does not, do you know the reason?

1 Like

@bsting Did you find any useful resources for avoiding denormal values during training? I am also curious

1 Like