I was training to understand how the code of Bert from transformers library works so I compared the output of the following codes:
self.bert = BertModel.from_pretrained('bert-base-uncased')
code1
_ , pooled_output = self.bert(input_ids, token_type_ids, attention_mask)
code2
x = self.bert.embeddings(input_ids, token_type_ids, attention_mask)
for i in range(0, 12):
x = self.bert.encoder.layer[i](x)[0]
pooled_output = self.bert.pooler(x)
However, the output of the two methods is not the same. (I used model.eval() so the dropout layer will not affect the output)
Finally, I should mention that I tried to fix the seed so I will get the same results by
seed_val = 42
random.seed(seed_val)
np.random.seed(seed_val)
torch.manual_seed(seed_val)
torch.cuda.manual_seed_all(seed_val)
Did I miss something?