The same network get different inference time when initialize network with random value or pre-trained weight ?

VisionZQ · November 22, 2018, 6:04am

I stack a network and have trained it well. Now, I test inference time of network that initialize with pre-trained weight, but it slower than initialization with random value in pytorch 0.4.0.

Then, I check the weight in the two condition.
Case A, initialize with random value as follow:
It seems that fixed-point;

CaseB, initialize with pre-trained weight as follow:
It seems that float-point;

I’m not very good at floating-point arithmetic in pytorch or other framework, but I intuition guess that there maybe some issue?

Thank you for your time and suggestion.

ptrblck · November 22, 2018, 8:33am

How did you time your script?
Does your model run on CPU or GPU?
Could you post an executable script? We’ve had a similar issue here and realized the model just performed differently using random weights (i.e. creation of more detection candidates).

VisionZQ · November 22, 2018, 9:15am

Hi, friends
Model run on CPU.
I just stack a network(i.e. ResNet) and random a tensor, then immediately run forward to test inference time that is fast about 150ms.
Then, I train the network with datasets(Cityscapes) until model convergence, next, I test inference time using trained weight to initialize network on CPU again, however it is slow about 200ms.

I found that they weight parameter is different show as above.

VisionZQ · November 22, 2018, 9:21am

Additional， I design a test case to validate the phenomenon shown as below:
Performed b is slower than a!
so, I’d like to confirm that the the phenomenon be caused by float precision(fixed point or float point ) or not in Pytorch ?

import torch
import time
import random
import numpy
random_num = [0.53, 0,3432, 0.3222e-1, 0.3562e-3, -9,4324e-7, 7,6534e-6,
              0.6734e-10, -3.3432e-15, -7.3422e-40, 4.6237e-35, -0.0632e-12
              -0.0032e-2, -6.8921e-9, 8.3423e-11,0.0342e-2, 6.9432e-12, 0.0999e-21]


if __name__ in "__main__":
    # loop count
    loop_num = 1000
    w = 200
    h = 200

    # random a tensor b
    a = torch.randn((w,h))

    print(a)
    print(a.dtype)
    st = time.time()
    for i in range(0,loop_num):
        a = a * a + a
    print("a consuming time = %.6f\n"%(time.time() - st))

    # zero a tensor d as same dim with b
    b = torch.randn((w, h))

    # used high precision value show as random_num to d
    for i in range(0, w):
        for j in range(0, h):
            bb = random.randrange(0, len(random_num))
            aa = float(random_num[random.randrange(0, len(random_num))])
            b[i, j] = aa
    print(b)
    print(b.dtype)
    st1 = time.time()
    for i in range(0,loop_num):
        b = b * b + b
    print("b consuming time = %.6f"%(time.time() - st1))
    print('over !!! ')

ptrblck · November 22, 2018, 10:11am

Thanks for the code!
I could reproduce the timing difference.
Unfortunately, I don’t know the exact reason for this. I couldn’t find e.g. if Kahan summation is used in PyTorch or any other linear algebra library. Let’s wait for someone more familiar with the implementations.

VisionZQ · November 22, 2018, 10:25am

Thanks for you work very much!
I will try to find out reason then share it.

VisionZQ · November 30, 2018, 8:50am

Maybe, this is caused by Pytorch framework because two matrix keep closed inference time in TensorFlow, and more convincing information needs to wait for official explanations.

Casablanca · January 17, 2019, 12:27am

hi, have you solved this issue? i got a same question.

VisionZQ · January 17, 2019, 1:12am

Hi,
I guess that the issue is caused by Pytorch framework, because the phenomenon doesn’t happen in TensorFlow.
Finally, I force to set value, which less than 1.0e-6, to 0.0 and the performance do not affect.
I hope that this is work to you. Thanks.

Casablanca · January 17, 2019, 8:26am

I will try it out. Thank you very much~