Hello,

I found different time cost with different weights on the same mobilenetv3 model using PyTorch.

The following code is used to measure time:

```
transform_fn = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225],
),
])
image = cv2.imread('test-image.jpg')
image = transform_fn(cv2.resize(image, (96,96), interpolation=cv2.INTER_LINEAR)).unsqueeze(0).cpu()
model.load_state_dict(torch.load('model_path', map_location='cpu'))
model.eval()
count = 100
with torch.no_grad():
start_ts = time.time()
for i in range(count):
outputs = model(image)
elapsed = (time.time() - start_ts) * 1000
print('elapsed time:', elapsed / count)
```

I doubted if it is caused by the number of zero parameters in the model. Then I have counted the zero params in the different weights. But it turned out that the pretrained weights have less zero parameters.

**Pretrained:**

Time: 8ms

Zero-params number (eps=1e-6): 541

Zero-params number (eps=1e-2): 114071

**After 100 epoch based on Pretrained:**

Time: 20ms

Zero-params number (eps=1e-6): 1538906

Zero-params number (eps=1e-2): 1619014

Zero parameters counting code:

```
eps = 1e-2
zero_cnt = 0
params = list(model.parameters())
for param in params:
zeros_count = torch.sum(torch.where(torch.abs(param) < eps, torch.ones_like(param), torch.zeros_like(param))).int().item()
zero_cnt += zeros_count
print('pretrained zero params:', zero_cnt)
```

Anybody know why this happens?

Thanks.