The same network get different inference time when initialize network with random value or pre-trained weight ?

I stack a network and have trained it well. Now, I test inference time of network that initialize with pre-trained weight, but it slower than initialization with random value in pytorch 0.4.0.

Then, I check the weight in the two condition.
Case A, initialize with random value as follow:
It seems that fixed-point;

CaseB, initialize with pre-trained weight as follow:
It seems that float-point;

I’m not very good at floating-point arithmetic in pytorch or other framework, but I intuition guess that there maybe some issue?

Thank you for your time and suggestion.

How did you time your script?
Does your model run on CPU or GPU?
Could you post an executable script? We’ve had a similar issue here and realized the model just performed differently using random weights (i.e. creation of more detection candidates).

Hi, friends
Model run on CPU.
I just stack a network(i.e. ResNet) and random a tensor, then immediately run forward to test inference time that is fast about 150ms.
Then, I train the network with datasets(Cityscapes) until model convergence, next, I test inference time using trained weight to initialize network on CPU again, however it is slow about 200ms.

I found that they weight parameter is different show as above.

Additional, I design a test case to validate the phenomenon shown as below:
Performed b is slower than a!
so, I’d like to confirm that the the phenomenon be caused by float precision(fixed point or float point ) or not in Pytorch ?

import torch
import time
import random
import numpy
random_num = [0.53, 0,3432, 0.3222e-1, 0.3562e-3, -9,4324e-7, 7,6534e-6,
              0.6734e-10, -3.3432e-15, -7.3422e-40, 4.6237e-35, -0.0632e-12
              -0.0032e-2, -6.8921e-9, 8.3423e-11,0.0342e-2, 6.9432e-12, 0.0999e-21]


if __name__ in "__main__":
    # loop count
    loop_num = 1000
    w = 200
    h = 200

    # random a tensor b
    a = torch.randn((w,h))

    print(a)
    print(a.dtype)
    st = time.time()
    for i in range(0,loop_num):
        a = a * a + a
    print("a consuming time = %.6f\n"%(time.time() - st))

    # zero a tensor d as same dim with b
    b = torch.randn((w, h))

    # used high precision value show as random_num to d
    for i in range(0, w):
        for j in range(0, h):
            bb = random.randrange(0, len(random_num))
            aa = float(random_num[random.randrange(0, len(random_num))])
            b[i, j] = aa
    print(b)
    print(b.dtype)
    st1 = time.time()
    for i in range(0,loop_num):
        b = b * b + b
    print("b consuming time = %.6f"%(time.time() - st1))
    print('over !!! ')

Thanks for the code!
I could reproduce the timing difference.
Unfortunately, I don’t know the exact reason for this. I couldn’t find e.g. if Kahan summation is used in PyTorch or any other linear algebra library. Let’s wait for someone more familiar with the implementations. :wink:

Thanks for you work very much!
I will try to find out reason then share it.

Maybe, this is caused by Pytorch framework because two matrix keep closed inference time in TensorFlow, and more convincing information needs to wait for official explanations.

hi, have you solved this issue? i got a same question.

Hi,
I guess that the issue is caused by Pytorch framework, because the phenomenon doesn’t happen in TensorFlow.
Finally, I force to set value, which less than 1.0e-6, to 0.0 and the performance do not affect.
I hope that this is work to you. Thanks.

I will try it out. Thank you very much~