JIT traced/scripted models are expected to produce the same output as eager models when given the same output.
This seems to be true when we use randomly initialized weights, but not true when weights are loaded to models.
Below is a code snippet to reproduce the issue:
Without loading weights:
import torch
from torchvision.models import resnet18, ResNet18_Weights
# Create eager, scripted, and traced models
eager_model = resnet18().cuda(0)
eager_model.eval()
script_model = torch.jit.script(eager_model).cuda()
script_model.eval()
trace_input = torch.randn(4, 3, 224, 224).cuda()
traced_model = torch.jit.trace(eager_model, trace_input).cuda()
traced_model.eval()
# Random input and feature extraction
x = torch.randn(16, 3, 224, 224).cuda()
with torch.no_grad():
eager_out = eager_model(x)
script_out = script_model(x)
traced_out = traced_model(x)
if not torch.allclose(eager_out, script_out):
print(f'Scripted: {(eager_out - script_out).abs().sum()}')
if not torch.allclose(eager_out, traced_out):
print(f'Traced: {(eager_out - traced_out).abs().sum()}')
There is no output for the code above.
Using pre-trained weights
Now consider the code below:
import torch
from torchvision.models import resnet18, ResNet18_Weights
# Create eager, scripted, traced models
eager_model = resnet18(weights=ResNet18_Weights.IMAGENET1K_V1).cuda(0)
eager_model.eval()
script_model = torch.jit.script(eager_model).cuda()
script_model.eval()
trace_input = torch.randn(4, 3, 224, 224).cuda()
traced_model = torch.jit.trace(eager_model, trace_input).cuda()
traced_model.eval()
# Random input for feature extraction
x = torch.randn(16, 3, 224, 224).cuda()
with torch.no_grad():
eager_out = eager_model(x)
script_out = script_model(x)
traced_out = traced_model(x)
if not torch.allclose(eager_out, script_out):
print(f'Scripted: {(eager_out - script_out).abs().sum()}')
if not torch.allclose(eager_out, traced_out):
print(f'Traced: {(eager_out - traced_out).abs().sum()}')
The output of the code is as follows:
Scripted: 4.249152183532715
Traced: 4.250102996826172
Any ideas what’s going on? Is this expected behavior? A difference of ~4.25 is small considering that the output is a tensor of [16 x 1000], but not insignificant. Obviously for classification this shouldn’t be much of an issue, but I do suspect that in other cases (where precision is critical) it may pose issues.
BTW I am using torch==1.13.1
and torchvision==0.14.1