Inconsistency between traced and true model outputs

SandSt0rm161 · July 16, 2020, 9:01am

I am trying to serialize my model to run in C++, but even before C++ I’m testing in python both the original and the trace model outputs on the same input and get different results, here is my python code:

import torch
from torchvision import models
import torch.nn as nn

model_file = r'D:\workspace\....\model.pt'
model_ft = models.resnet50()
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs, 6)
model_ft.load_state_dict(torch.load(model_file))

# An example input you would normally provide to your model's forward() method.
example = torch.rand(1, 3, 224, 224)

# Use torch.jit.trace to generate a torch.jit.ScriptModule via tracing.
traced_script_module = torch.jit.trace(model_ft, example)

# Test both outputs of original and traced model to compare
model_ft.eval()
with torch.no_grad():
    output_model = model_ft(torch.zeros(1, 3, 224, 224))

output_traced_model = traced_script_module(torch.zeros(1, 3, 224, 224))

print('output_model = ' + str(output_model))
print('output_traced_model = ' + str(output_traced_model))
# save traced model
traced_script_module.save("traced_resnet_model.pt")

and variables 'output_model ’ and 'output_traced_model ’ are completely different:

output_model = tensor([[-0.0805,  0.2096,  0.0873, -0.0468, -0.1598, -0.0375]])
output_traced_model = tensor([[-0.4763,  1.5731,  0.2112,  0.3496, -1.6906,  0.0191]],
       grad_fn=<DifferentiableGraphBackward>)

ptrblck · July 17, 2020, 11:52am

The difference is created, because you are running the eager model in eval(), while the traced model was in train() mode.
From the docs:

In the returned ScriptModule, operations that have different behaviors in training and eval modes will always behave as if it is in the mode it was in during tracing, no matter which mode the ScriptModule is in.

SandSt0rm161 · July 18, 2020, 4:25am

Thanks for the reply.
I confirm that switching the original model to eval() mode before creating the traced model solves the issue.