Hi, recently I have been trying to convert StarGAN v1 from Pytorch to ONNX and they had an Instance normalization layer with track_running_stats=True. When I exported the model to ONXX it turned out that the exporter does not export the run mean/variance. Nevertheless, the onnx model still gives comparable results to the original model. I was thinking about why it can happen. Then I did the little experiment. I wanted to understand what is the difference between batch norm, instance norm with running mean/var and instance norm without the mean/var. So, I have initialized three layers with the same weights.
normTrue = nn.InstanceNorm2d(64, affine=True, track_running_stats=True)
normFalse = nn.InstanceNorm2d(64, affine=True)
input = torch.randn(10, 64, 128, 128)
bnorm = nn.BatchNorm2d(64, affine=True, track_running_stats=True)
w = torch.rand(64)
b = torch.rand(64)
m = torch.rand(64)
v = torch.rand(64)
with torch.no_grad():
normTrue.weight=nn.Parameter(w)
normTrue.running_mean=m
normTrue.running_var=v
normTrue.bias=nn.Parameter(b)
bnorm.weight=nn.Parameter(w)
bnorm.running_mean=m
bnorm.running_var=v
bnorm.bias=nn.Parameter(b)
normFalse.weight=nn.Parameter(w)
normFalse.bias=nn.Parameter(b)
It turned out that in the training mode the instance normalization with tracking running stats and without are acting the same.
with torch.no_grad():
normoutTrue=normTrue(input).detach().cpu().numpy()
normoutFalse=normFalse(input).detach().cpu().numpy()
bnormout=bnorm(input).detach().cpu().numpy()
print(np.max(np.abs(normoutTrue-bnormout)))
print(np.max(np.abs(normoutTrue-normoutFalse)))
print(np.max(np.abs(bnormout-normoutFalse)))
0.05608654
0.0
0.05608654
But in the inference mode they are different, actually, instance norm with tracking running mean and variance is similiar to the batch norm
normTrue.eval()
normFalse.eval()
torch.onnx._export(normTrue, # model being run
(torch.rand(10,64, 128, 128)),
"./norm.onnx") ;
bnorm.eval()
torch.onnx._export(bnorm, # model being run
(torch.rand(10,64, 128, 128)),
"./bnorm.onnx") ;
with torch.no_grad():
normoutTrue=normTrue(input).detach().cpu().numpy()
normoutFalse = normFalse(input).detach().cpu().numpy()
bnormout=bnorm(input).detach().cpu().numpy()
print(np.max(np.abs(normoutTrue-bnormout)))
print(np.max(np.abs(normoutTrue-normoutFalse)))
print(np.max(np.abs(bnormout-normoutFalse)))
9.536743e-07
6.2534113
6.2534113
So, I have a hard time understanding what is the need for normalizing by run mean/var in the inference stage if they were not used in the training phase and I was actually interested whether ONNX exporter ignores running mean/variances on purpose.
Would be grateful for any clarification,
Best Regards