Bert additional pre-training

I would like to use transformers/hugging face library to further pretrain BERT. I found the masked LM/ pretrain model, and a usage example, but not a training example.
In the original BERT repo I have this explanation, which is great, but I would like to use Pytorch.
I’m not looking to finetune the model, just pretrain it further on the IMDB dataset, starting with an already trained model.
Also, I assume that extracting BERT weights without the masked LM head and loading it into a regular BERT model should not be an issue.


Did you figure out how to do this?

For now, I’m going with the original BERT repo in tf, than will try converting to torch. as explained here.

Use this repo

This is slightly old but works. Let me know if you face any issues.

1 Like

This introduces the overhead of using Azure.
I was looking for an example using the pytorch-pretrained-bert or transformers, maybe something with the model zoo - all native to Pytorch and agnostic to infrastructure.
Thanks though.

This uses huggungface Bert. If you can set the rank and world size explicitly, then you don’t need to use azure.

Using the original Google repo and converting to torch works.
[EDIT] one caveat though, the new model is much heavier and needs more computational time and power to use.

This might not be exactly what you are looking for, but it might be easier to follow the huggingface interface for finetuning (very well documented), augment it for a given task and then just use the hidden states of the BERT model to get the embeddings.

I specifically wanted to do the additional pretraining.
It is one if the steps used in some papers in order to adjust the distribution.
The masked lm and next sentence classifier are available in huggingface, but I haven’t seen any training examples.

This looks relevant. You can see there an example of LM task, you can reuse it/build on it and create your own LM task inside which you will initialize the weights of bert with a pretrained version and then train it with your own data.

The link @macwiatrak provided is giving a 404 back. Could you please provide an alternative link to the same page if still available?

I actually ended up using the official repo for additional pretraining and exporting to PyTorch.

Thank you for your answer!
Could you please provide the repo of your code if it is publicly available?

Thank you.

@maria I would be interested too :slight_smile:

But that’s just the official Tensorflow version, do you also have your PyTorch version available?

I used tensorflow and than converted that to PyTorch.
The official implementation is very user friendly.
That’s part of the reason I didn’t post the solution here :slightly_smiling_face:.

You can use the scripts provided in the transformers library. These are well-tested and provide many useful options. For BERT in particular, you can have a look here. If you have additional questions, you can use their dedicated forum.

Awesome! I’m glad to see examples (didn’t find any at the time - 6mo ago).

Hello @maria ,

did you use the scripts of the transformers library?

If you did, which one do you suggest, Hugging Face or the official repo of BERT?
The latter includes using TensorFlow and then export to PyTorch. Ideally, I would like to use just PyTorch and no TensorFlow.