InstanceNorm vs BatchNorm num_batches_tracked

Is there a reason why num_batches_tracked gets updated in BN but not in IN?

import torch
torch.__version__
# '1.13.1'
# Create a batch of 16 data points with 2 features
x = torch.randn(16, 2, 10)

InstanceNorm:

# Create an instance normalization layer with track_running_stats=True
norm_layer = torch.nn.InstanceNorm1d(2, track_running_stats=True, affine=False)

# Process the input data using the normalization layer
y = norm_layer(x)

# Print the running statistics of the layer
print(norm_layer.state_dict())
# OrderedDict([('running_mean', tensor([-0.0022,  0.0019])), ('running_var', tensor([1.0003, 1.0025])), ('num_batches_tracked', tensor(0))])

BatchNorm:

# Create an instance normalization layer with track_running_stats=True
norm_layer = torch.nn.BatchNorm1d(2, track_running_stats=True, affine=False)

# Process the input data using the normalization layer
y = norm_layer(x)

# Print the running statistics of the layer
print(norm_layer.state_dict())
# OrderedDict([('running_mean', tensor([-0.0022,  0.0019])), ('running_var', tensor([0.9972, 1.0007])), ('num_batches_tracked', tensor(1))])

num_batches_tracked is simply not relevant for InstanceNorm since it doesn’t need to keep track of running statistics from multiple batches and can be applied independently per instance

1 Like

Thanks Mark! I have noticed that running_mean and running_var gets updated but num_batches_tracked does not. My question is as one can get the running_mean and running_var, why not num_batches_tracked? The documentation mentions keeping running estimates. What am I missing here?

If track_running_stats is set to True , during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation.

For example, in the following code, the num_batches_tracked after the first run should be 1 as other buffers are updated.

momentum = .1
eps = 1e-5
norm_layer = torch.nn.InstanceNorm1d(2, track_running_stats=True, affine=False, momentum=momentum, eps=eps)
print(norm_layer.state_dict())
norm_layer.train()

x = torch.randn(16, 2, 10)
y = norm_layer(x)
print(norm_layer.state_dict())

m = torch.mean(x, dim=(2), keepdim=True) 
s = torch.sqrt(torch.var(x, dim=(2), keepdim=True, unbiased=False)+ eps)
y_ = (x-m) / s
print(torch.allclose(y, y_))

print((1. - momentum) * 0. + momentum * torch.mean(x, dim=(0, 2))) # == running_mean
print(torch.mean(x, dim=(0, 2))) # != running_mean
OrderedDict([('running_mean', tensor([0., 0.])), ('running_var', tensor([1., 1.])), ('num_batches_tracked', tensor(0))])
OrderedDict([('running_mean', tensor([-0.0209, -0.0072])), ('running_var', tensor([1.0197, 0.9940])), ('num_batches_tracked', tensor(0))])
True
tensor([-0.0209, -0.0072])
tensor([-0.2089, -0.0724])

num_batches_tracked is used in batchnorm layers to update the running stats with the cumulative moving average, which is used if the momentum argument is set to None.
This option is not supported in instancenorm layers as these expect a valid floating point value for the momentum and don’t accept None.

1 Like

Thanks Piotr - it makes sense now. Just out of curiosity, may I ask why momentum=None is not an option in InstanceNorm?

Unfortunately, I don’t know why it’s disallowed or if it was just never implemented.
Internally batch_norm would be used as seen here so I would assume a cumulative moving average usage should be possible, but also don’t know if it wouldn’t make sense.