Warning when evaluating a model for the second time

Hello everyone,

first of all - this is my first post on PyTorch forum - so I would like to thank for every kind of support I’ve ever found here since 2018. :slight_smile:

Now, back to my problem. I’m using a torchscript-ed model from Python’s PyTorch here in C++ code. Experiments conducted by far show how values are identical both in Python and C++. Therefore, I think things are exported/imported successfully.

But, there is a strange warning I receive. In this minimal example

int main() {
    torch::NoGradGuard guard;

    torch::Device device = torch::kCPU;
    // same warning when: torch::Device device = torch::kCUDA;

    torch::Tensor t1 = torch::rand({1, 3, 224, 224}, torch::kFloat).to(device);
    torch::Tensor t2 = torch::rand({1, 3, 224, 224}, torch::kFloat).to(device);
    torch::Tensor t3 = torch::rand({1, 3, 224, 224}, torch::kFloat).to(device);
        
    torch::jit::script::Module model = torch::jit::load("../cache/model.pt", device);
    
    std::cout << "Feedforwarding 1st time\n";
    std::vector<torch::jit::IValue> inputs{};
    inputs.push_back(t1);
    model.forward(inputs).toTensor();

    std::cout << "Feedforwarding 2nd time\n";
    inputs.clear();
    inputs.push_back(t2);
    model.forward(inputs).toTensor();

    std::cout << "Feedforwarding 3rd time\n";
    inputs.clear();
    inputs.push_back(t3);
    model.forward(inputs).toTensor();

    return 0;
}

when I feedforward t1, everything seems fine, then when feedforward t2 the warning appears, and afterwards everything is fine once again - the output is:

Feedforwarding 1st time
Feedforwarding 2nd time
[W graph_fuser.cpp:108] Warning: operator() profile_node %353 : int = prim::profile_ivalue(%out_dtype.1)
 does not have profile information (function operator())
Feedforwarding 3rd time

Also, during this second evaluation I noticed abnormal running time (approx. 5 seconds for 2nd evaluation, and approx. 20ms for 1st, 3rd, 4th,… evaluations on my modest notebook GPU).

Any help is appreciated.

This ist the JIT looking for optimization potential.
These three (somewhat undocumented?) API calls from graph_executor.h / namespace torch::jit might help:

TORCH_API std::atomic<bool>& getProfilingMode();
TORCH_API std::atomic<bool>& getExecutorMode();
TORCH_API std::atomic<size_t>& getNumProfiledRuns();

Use getNumProfiledRuns to choose which run will be slow or change one of the two the above to disable things.
Of course, only do this if you don’t need the fuser.

Best regards

Thomas