I found that batch normalization parameters such as running_mean and running_var are not included in model.parameters(). However, they appear in the model[‘state_dict’]. I would like to know if it is true or that might be a mistake of mine.
I think you’re right here by running_mean and running_var included in model.state_dict() rather than model.parameters().
My understanding is running_mean and running_var are just stat data extracted from a particular batch of data points, but during the model update phase i.e. using gradients calculated to update the model, those stat data won’t be updated. In this case, model.parameters() only contains those parameters which will be “trained” during the model training process.
Actually, by checking optimizer usage, you can get similar conclusion:
Thank you very much for your kind response. I totally agree with you that the the running_mean and the running_var are calculated by batch statistics during training.
But during test time, we use the pop_mean and pop_var for test, which represent the entire dataset statistics. Please refer to the training step2 in the answer of Le Quang Vu here. The pop_mean and pop_var get updated during the training stage as well.
I suppose that the pop_mean and pop_var here are equivalent to running_mean and running_var in pytorch so that they need to be updated?
Based on the pointer you provided, in TensorFlow pop_mean and pop_var are updated adaptive during the model training step (batch by batch) based on batch_mean and batch_var for current batch with some decay (say 0.99). And during the test step, pop_mean/var will be used directly for model evaluation.
Thank you @zazzyy very much for your kind and detailed answer. It’s really helpful.
Maybe I haven’t well-explained in my first post. In fact, what I would like to do here is freezing several modules of a pre-trained model and pass other modules (including a resnet block with batch norm layer) to optimizers to train. However, when I checked the parameters passed to optimizer, I found that running_mean and running_average are not included.
That’s why I’am wondering if the running_mean and running_average could be updated during the training.
I supposed that pytorch will still update the running_mean and running_average even though they are not explicitly passed to optimizers.