Is it possible to go from layer to layer in tranformers Bert

margemi · April 18, 2020, 2:34pm

I was training to understand how the code of Bert from transformers library works so I compared the output of the following codes:

self.bert = BertModel.from_pretrained('bert-base-uncased')

code1

 _ , pooled_output = self.bert(input_ids, token_type_ids, attention_mask)

code2

x = self.bert.embeddings(input_ids, token_type_ids, attention_mask)
    for i in range(0, 12):
        x = self.bert.encoder.layer[i](x)[0]
    pooled_output = self.bert.pooler(x)

However, the output of the two methods is not the same. (I used model.eval() so the dropout layer will not affect the output)

Finally, I should mention that I tried to fix the seed so I will get the same results by

seed_val = 42
random.seed(seed_val)
np.random.seed(seed_val)
torch.manual_seed(seed_val)
torch.cuda.manual_seed_all(seed_val)

Did I miss something?

ptrblck · April 19, 2020, 4:12am

I assume you are using BertModel from the Huggingface repository?
If so, then you could compare all intermediate outputs to the original forward method and check, where the difference occurs in your manual calculation.