Running stats handling on inference flow

Hello all,

Have been trying to trace and export some vision models that contain operators like BatchNorm2d and end-up having all the running stats (e.g. running_mean/running_var) as buffer inputs instead of parameters.

Have had to manually remove them and handle with something like:

def clear_running_stats_for_inference(model):
    for _, module in model.named_modules():
        if isinstance(module, (nn.BatchNorm1d, nn.BatchNorm2d, nn.BatchNorm3d)):
            # Disable training semantics
            module.track_running_stats = False
            module.training = False

            # Remove batch tracking
            if hasattr(module, "num_batches_tracked"):
                delattr(module, "num_batches_tracked")

            # Set running_var/running_mean as parameters
            _buffer_to_parameter(module, "running_mean")
            _buffer_to_parameter(module, "running_var")

As I am new to this would like to understand if it would make sense for them to be handled like constants parameters during inference and what is the best way forward in order to avoid explicit handling.

Could you describe why you want to transform the buffers to parameters if you are not training them via gradient descent?

Hey @ptrblck,

From what I gather parameters are for trainable semantics; what I wanted to achieve is constant-ness when exporting for inference for these buffers. Moving them to parameters helped me achieve this.

To elaborate I am trying to export mobilenet and use torch-mlir to go down to the torch dialect; but I notice that all these stats end-up becoming inputs to the model as they are stored in buffers while they are really constant during inference.

Highly likely that this is something that can be handled at torch-mlir level really; trying to figure out.

This is correct. nn.Parameters are trainable and can be registered into nn.Modules.
E.g. the .weight and .bias attributes of e.g. nn.Linear are trainable parameters:

lin = nn.Linear(10, 10)
print(lin.weight)
# Parameter containing:
# tensor([[-0.2245,  0.2853, -0.2039,  0.1198,  0.1099,  0.2087,  0.1796, -0.1416,
#           0.2469, -0.2738],
#         ...

for name, param in lin.named_parameters():
    print(name)
# weight
# bias

In contrast to parameters, buffers are also registered to modules but are not trainable.
The running stats are buffers since they are not trained via gradient descent and no gradient will be computed for them.

bn = nn.BatchNorm2d(3)
for name, param in bn.named_parameters():
    print(name)
# weight
# bias
    
for name, buffer in bn.named_buffers():
    print(name)
# running_mean
# running_var
# num_batches_tracked

Transforming the running_* stats into parameters will make them trainable while you claim you want to achieve “constantness”, so I assume you want to keep them static.

In addition to Piotr’s comments- is the problem you’re trying to solve the fact that the batchnorm buffers are showing up as graph inputs in the exported graph, but you’d like them to be constant s on the graph instead?

If you call .module() on the output of export, any module state (both params and buffers) will be unlifted back into state. You can do something like:

(returns a torch.fx.GraphModule, where any params/buffers that had previously been lifted into graph inputs will once again become module state)

torch.export.export(model).run_decompositions().module()