I am using pretrained Bert model by Huggingface from https://github.com/huggingface/pytorch-pretrained-BERT to get word emebedings by getting hiden states.
However it seems that the emebeding vectors that I get are the same no matter which layer I choose and also it seems that for example Flair implementation gives different results, when doing cosine similarity between different words.
What I am doing wrong? Is there any processing I have to do with the hiddent states to convert them to vectors. Maybe normalising or something like this.
tokens_tensor_1 = torch.tensor([indexed_tokens_1]) segments_tensors_1 = torch.tensor([segments_ids_1]) tokens_tensor_1 = tokens_tensor_1.to('cuda') segments_tensors_1 = segments_tensors_1.to('cuda') with torch.no_grad(): hidden_states_1, _ = model(tokens_tensor_1)#, segments_tensors_1) print ("tokenized_text_1:",tokenized_text_1) vectors =  for index in range(0, len(tokenized_text_1)): torch_vector = hidden_states_1[index] torch_vector = torch_vector.to('cpu') numpy_vector = torch_vector.numpy() vectors.append(numpy_vector)`