Why would model loading takes that long?(usually takes 2 seconds to complete)

Hi there,
I am curious about why the first time job launch cost so much time in PyTorch.
e.g. load the model into CPU, move the model to GPU.

Here is the code snippet to reproduce:

import torch.multiprocessing as mp
from torchvision import models
import torch
import time

def only_import():
    t1 = time.time()

    model = torch.hub.load('pytorch/vision:v0.4.2',
    # util.set_fullname(model, MODEL_NAME)
    t2 = time.time()

    model = model.to('cuda')
    t3 = time.time()

    print("import model takes", (t2 - t1) * 1e3, 'ms')
    print("move model to cuda", (t3 - t2) * 1e3, 'ms')

def main():

    for _ in range(10):
        proc = mp.Process(target=only_import)

if __name__ == "__main__":

Base on the log, it usually takes 1.5s to 2.2s to load the model to CPU. and over 2 seconds to load the model to cuda.

Is this because PyTorch need to dynamically load libraries from disk, at the first time?

(experiment setup: aws-ec2-p3.2xlarge intance, PyTorch 1.3.0)

The first time to use cuda, cudnn has to allot some cache and that takes time. If you pass a dummy input to GPU like torch.randn(4).to(device) then after that you will normal transfer speed.