will we tune eps and momentum in batch norm often in practice
eps is there for numerical stability, so it is unusual to tune it. momentum implicitly defines number of last samples used to estimate (population) moments, so it can be reasonable to decrease momentum, if estimates are not changing smoothly (for example due to small batch size)
so it would be better to decrease the momentum when we are using small batch size to improove the accuracy?
Well, it won’t hurt, to have say batch_size/momentum constant, when decreasing batch size.
I’d rather say that you can sometimes fix batchnorm problems by using smaller momentum, i.e. you can only improve subpar results and only when batchnorm is the issue.