I traced a model and I am using it within libtorch, but for some bizzare reason, I notice that the results are different at each run I am totally baffled by this.
So, I load the model like so:
torch::jit::script::Module module_af;
module_af = torch::jit::load(af_model_path);
module_af.eval();
// std::cout << "Model Load ok\n";
filelog.get()->info("Model Load ok");
and I run the inference like so:
Mat img4encodingRGB = imread(allign_filename, cv::COLOR_BGR2RGB);
auto img2encode = torch::from_blob(img4encodingRGB.data, {img4encodingRGB.rows, img4encodingRGB.cols, img4encodingRGB.channels()}, at::kByte);
img2encode = img2encode.to(at::kFloat).div(255).unsqueeze(0);
img2encode = img2encode.permute({ 0, 3, 1, 2 });
img2encode.sub_(0.5).div_(0.5);
I run the forward like so:
std::vector<torch::jit::IValue> arcface_inputs;
arcface_inputs.push_back(img2encode);
at::Tensor embeds0 = module_af.forward(arcface_inputs).toTensor();
std::cout << embeds0; // GIves different output on each run.
I am really baffled by this. The problem seems to be even worse - on two machines, they seem to produce identical results on concecutive runs, but on two other machines they dont. All packages are EXACTLY the same - libtorch 1.6 and the above code compiled using cmake.
It kinda reminds me of undefined behaviour, but I am totally lost because on two machines (a server and a vm) they seem to produce identical results - but they dont on other two.
I have triple checked everything to see if I am doing something stupid, but it does not seem like it - and hence my post.
Hope someone can point me to clues as to what I could be doing wrong
Example output run:1:
Columns 1 to 10-0.1005 -0.1768 -0.2082 0.1240 0.1185 0.3801 0.1378 0.1269 -0.3572 -1.1453
run:2:
Columns 1 to 10-0.1861 -0.3326 -0.3739 0.2302 0.1730 0.5391 0.1965 0.1972 -0.5481 -1.7317