How many parameters are there for the BERT base which has 12 encoder blocks? I figure it’s 192 but I am not sure. Is there a convenient way to load these parameters without depending on the actual BERT model? I had a look a huggingface transformers library but I found it very complicated and needed to download a lot of dependencies. Ideally, I just want to download the pretrained weights as numpy arrays.
Another question related to this: When training the BERT model with the pretrained weights - do we change all parameters in the network or only the parameters inside FFNN which come after the transformer encoder blocks?