I would like to have a new layer in the middle of BERT-BASE i.e.,BERT-BASE has 12 encoder layers, I would like to place a new layer after 6 encoder layers and the output of custom layer need to be input for 7th encoder layer of bert.
How can I implement this in pretrained bert models.
You can simply replace the 6th layer with another layer. This layer will have 6th layer encoder as first layer and your custom layer as the second layer. For example, let’s try adding another encoder in the 5th layer:-
from torch import nn
from transformers import BertForSequenceClassification, BertTokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
class CustomBertLayer(nn.Module):
def __init__(self, bert_layer):
super().__init__()
self.bert_layer = bert_layer
self.custom_layer = bert_layer # Custom Layer
def forward(self, *x):
bert_out = self.bert_layer(*x)
return self.custom_layer(*bert_out)
model.bert.encoder.layer[5] = CustomBertLayer(model.bert.encoder.layer[5])
text = "Replace me by any text you'd like. Can't replace you!"
model(tokenizer.encode(text, return_tensors='pt'))
Let’s say you want to add a custom layer i.e. not an encoder. Then you’ll need to modify the flow of i/o in these. Let’s take a look by adding Linear Layer to this:-
from torch import nn
from transformers import BertForSequenceClassification, BertTokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
class CustomBertLayer(nn.Module):
def __init__(self, bert_layer):
super().__init__()
self.bert_layer = bert_layer
self.custom_layer = nn.Linear(768, 768)
def forward(self, *x):
bert_out = self.bert_layer(*x)
return (self.custom_layer(bert_out[0].unsqueeze(0))[0],)
model.bert.encoder.layer[5] = CustomBertLayer(model.bert.encoder.layer[5])
text = "Replace me by any text you'd like. Can't replace you!"
model(tokenizer.encode(text, return_tensors='pt'))
In the above code we’ll need to pass batched output in Linear Layer but non-batched output to the next encoder layer. Hence we modified the i/o accordingly.
Don’t really recommend Sequential utility for this. Bert Layer gets and gives i/o in tuple format (tensor, ).
Thanks @krypticmouse, it was very helpful. I have one more requirement. My custom layer is resulting two outputs and one of the output need to fed to BERT next layer while other result is required for another purpose. Can you please help me in this scenario.
Then you can just return the output needed for BERT. If you return both to the next layer then you’ll need to modify the next layer as per that input as well.