I was training to understand how the code of Bert from transformers library works so I compared the output of the following codes:
self.bert = BertModel.from_pretrained('bert-base-uncased')
_ , pooled_output = self.bert(input_ids, token_type_ids, attention_mask)
x = self.bert.embeddings(input_ids, token_type_ids, attention_mask) for i in range(0, 12): x = self.bert.encoder.layer[i](x) pooled_output = self.bert.pooler(x)
However, the output of the two methods is not the same. (I used model.eval() so the dropout layer will not affect the output)
Finally, I should mention that I tried to fix the seed so I will get the same results by
seed_val = 42 random.seed(seed_val) np.random.seed(seed_val) torch.manual_seed(seed_val) torch.cuda.manual_seed_all(seed_val)
Did I miss something?