New layer in BERT

I would like to have a new layer in the middle of BERT-BASE i.e.,BERT-BASE has 12 encoder layers, I would like to place a new layer after 6 encoder layers and the output of custom layer need to be input for 7th encoder layer of bert.

How can I implement this in pretrained bert models.

Thanks in Advance

1 Like

You can simply replace the 6th layer with another layer. This layer will have 6th layer encoder as first layer and your custom layer as the second layer. For example, let’s try adding another encoder in the 5th layer:-

from torch import nn
from transformers import BertForSequenceClassification, BertTokenizer

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')

class CustomBertLayer(nn.Module):
    def __init__(self, bert_layer):
        super().__init__()
        self.bert_layer = bert_layer
        self.custom_layer = bert_layer # Custom Layer
    
    def forward(self, *x):
        bert_out = self.bert_layer(*x)
        return self.custom_layer(*bert_out)

model.bert.encoder.layer[5] = CustomBertLayer(model.bert.encoder.layer[5])


text = "Replace me by any text you'd like. Can't replace you!"
model(tokenizer.encode(text, return_tensors='pt'))

Let’s say you want to add a custom layer i.e. not an encoder. Then you’ll need to modify the flow of i/o in these. Let’s take a look by adding Linear Layer to this:-

from torch import nn
from transformers import BertForSequenceClassification, BertTokenizer

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')

class CustomBertLayer(nn.Module):
    def __init__(self, bert_layer):
        super().__init__()
        self.bert_layer = bert_layer
        self.custom_layer = nn.Linear(768, 768)
    
    def forward(self, *x):
        bert_out = self.bert_layer(*x)
        return (self.custom_layer(bert_out[0].unsqueeze(0))[0],)

model.bert.encoder.layer[5] = CustomBertLayer(model.bert.encoder.layer[5])


text = "Replace me by any text you'd like. Can't replace you!"
model(tokenizer.encode(text, return_tensors='pt'))

In the above code we’ll need to pass batched output in Linear Layer but non-batched output to the next encoder layer. Hence we modified the i/o accordingly.

Don’t really recommend Sequential utility for this. Bert Layer gets and gives i/o in tuple format (tensor, ).

Thanks @krypticmouse, it was very helpful. I have one more requirement. My custom layer is resulting two outputs and one of the output need to fed to BERT next layer while other result is required for another purpose. Can you please help me in this scenario.

Then you can just return the output needed for BERT. If you return both to the next layer then you’ll need to modify the next layer as per that input as well.

If I return both outputs, how can I feed the input to next layer of bert? Is this possible?

Something like this:-

class CustomBertLayer(nn.Module):
    def __init__(self, bert_layer):
        super().__init__()
        self.bert_layer = bert_layer
        self.custom_layer = CustomLayer()
    
    def forward(self, *x):
        bert_out = self.bert_layer(*x)
        out_1, out_2 = self.custom_layer(bert_out[0].unsqueeze(0))
        return (out_1, out_2)

model.bert.encoder.layer[5] = CustomBertLayer(model.bert.encoder.layer[5])

But now you’ll also have to modify model.bert.encoder.layer[6] for recieving and processing those inputs, like:-

class CustomBertLayer2(nn.Module):
    def __init__(self, bert_layer):
        super().__init__()
        self.bert_layer = bert_layer
    
    def forward(self, out_1, out_2):
        bert_out = self.bert_layer(out_1) # Pass out_1 to encoder
        # do whatever with out_2
        return (bert_out, )

model.bert.encoder.layer[6] = CustomBertLayer2(model.bert.encoder.layer[6])

@krypticmouse Thanks alot