How to set learning rate as 0 in BN layer

elysion · March 19, 2018, 8:38am

In Caffe we can set learning rate as 0 by using ‘lr_mult: 0’ . It means only the mean/var are calculating , but no parameter is learnt in training.


layer {
    name: "bn_conv1"
    type: "BatchNorm"
    bottom: "data"
    top: "bn_conv1"
    batch_norm_param {
        use_global_stats: false
    }
    param {
        lr_mult: 0
    }
    param {
        lr_mult: 0
    }
    param {
        lr_mult: 0
    }
    include {
        phase: TRAIN
    }
}

layer {
    name: "scale_conv1"
    type: "Scale"
    bottom: "bn_conv1"
    top: "bn_conv1"
    scale_param {
        bias_term: true
    }
}

In Pytorch, Is “affine=False” the same as “lr_mult:0”?
·

 nn.BatchNorm3d(num_features=64, momentum=0.999, affine=False)

ptrblck · March 19, 2018, 9:24am

Setting affine=False will remove the gamma and beta terms from the calculation, thus only using the running mean and var. So that’s basically, what you want.
I don’t know, how Caffe works, but setting the learning rate to 0 is something different in my opinion, since you still could have the gamma and beta terms with constant (random) values.

ptrblck · March 19, 2018, 9:29am

Also, be careful with the momentum argument, since it is different from the one used in optimizer classes and the conventional notion of momentum. Have a look at this note. You probably want to lower it, i.e. setting it to something like 0.001.

elysion · March 19, 2018, 9:41am

Thank you very much!
Caffe use base_lr to set base learning rate. Learning rate of a param is base_lr*lr_mult, so parameter in the layer can be set separately in Caffe. But maybe I misunderstand the meaning of some params. I’ll check it again!

elysion · March 20, 2018, 9:09am

I check param again， I was wrong about the meaning of mult_lt=0.
I noticed that track_running_stats is a new parameter in version 0.4, I did’t see it in previous version.
The doc said that this module tracks the running mean and variance, and when set to False, this module does not track such statistics and always uses batch statistics in both training and eval modes.
What is the meaning of “tracks the running mean and variance”? It did’t do it in previous version?

ptrblck · March 20, 2018, 12:59pm

If track_running_stats is set to True, the vanilla batch norm is used, i.e. the batch statistics are stored furing training in running_mean and running_var. Setting the model to evaluation mode, will use these statistics for all validation samples.
Setting track_running_stats to False will use the batch statistics even in evaluation mode of the current batch.