Bert additional pre-training

maria · February 20, 2020, 8:26pm

I would like to use transformers/hugging face library to further pretrain BERT. I found the masked LM/ pretrain model, and a usage example, but not a training example.
In the original BERT repo I have this explanation, which is great, but I would like to use Pytorch.
I’m not looking to finetune the model, just pretrain it further on the IMDB dataset, starting with an already trained model.
Also, I assume that extracting BERT weights without the masked LM head and loading it into a regular BERT model should not be an issue.

Jerry_Chi · February 21, 2020, 1:32am

Did you figure out how to do this?

maria · February 21, 2020, 2:15am

For now, I’m going with the original BERT repo in tf, than will try converting to torch. as explained here.

krishansubudhi · February 21, 2020, 6:45am

Use this repo

This is slightly old but works. Let me know if you face any issues.

maria · February 21, 2020, 10:40am

This introduces the overhead of using Azure.
I was looking for an example using the pytorch-pretrained-bert or transformers, maybe something with the model zoo - all native to Pytorch and agnostic to infrastructure.
Thanks though.

krishansubudhi · February 21, 2020, 2:59pm

This uses huggungface Bert. If you can set the rank and world size explicitly, then you don’t need to use azure.

maria · February 22, 2020, 7:54am

Using the original Google repo and converting to torch works.
[EDIT] one caveat though, the new model is much heavier and needs more computational time and power to use.

macwiatrak · February 24, 2020, 4:05pm

This might not be exactly what you are looking for, but it might be easier to follow the huggingface interface for finetuning (very well documented), augment it for a given task and then just use the hidden states of the BERT model to get the embeddings.

maria · February 24, 2020, 6:03pm

I specifically wanted to do the additional pretraining.
It is one if the steps used in some papers in order to adjust the distribution.
The masked lm and next sentence classifier are available in huggingface, but I haven’t seen any training examples.

macwiatrak · February 25, 2020, 12:14am

This looks relevant. You can see there an example of LM task, you can reuse it/build on it and create your own LM task inside which you will initialize the weights of bert with a pretrained version and then train it with your own data.

Tykat · October 6, 2020, 11:38am

The link @macwiatrak provided is giving a 404 back. Could you please provide an alternative link to the same page if still available?

maria · October 7, 2020, 5:48am

I actually ended up using the official repo for additional pretraining and exporting to PyTorch.

Tykat · October 7, 2020, 6:23pm

Hi!
Thank you for your answer!
Could you please provide the repo of your code if it is publicly available?

Thank you.

spadel · November 16, 2020, 2:50pm

@maria I would be interested too

maria · November 17, 2020, 2:50pm

spadel · November 18, 2020, 11:46am

But that’s just the official Tensorflow version, do you also have your PyTorch version available?

maria · November 18, 2020, 12:19pm

I used tensorflow and than converted that to PyTorch.
The official implementation is very user friendly.
That’s part of the reason I didn’t post the solution here .

BramVanroy · November 18, 2020, 12:38pm

You can use the scripts provided in the transformers library. These are well-tested and provide many useful options. For BERT in particular, you can have a look here. If you have additional questions, you can use their dedicated forum.

maria · November 18, 2020, 1:49pm

Awesome! I’m glad to see examples (didn’t find any at the time - 6mo ago).

m-nlp-q · March 12, 2021, 10:04am

Hello @maria ,

did you use the scripts of the transformers library?

If you did, which one do you suggest, Hugging Face or the official repo of BERT?
The latter includes using TensorFlow and then export to PyTorch. Ideally, I would like to use just PyTorch and no TensorFlow.