I am trying to better understand how RoBERTa model (from huggingface transformers) works.
My batch_size
is 64
My roberta model looks like this
roberta = RobertaModel.from_pretrained(config['model'])
roberta.config.max_position_embeddings = config['max_input_length']
RobertaConfig {
"architectures": [
"attention_probs_dropout_prob": 0.1,
"bos_token_id": 0,
"eos_token_id": 2,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-05,
"max_position_embeddings": "1024",
"model_type": "roberta",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 1,
"type_vocab_size": 1,
"vocab_size": 50265
And in my model, in forward() method I have tried:
embedded , x = self.roberta(text)
Now, from what I read in the documentation and source code from huggingface, the output of self.roberta(text)
should be
prediction_scores (
of shape(batch_size, sequence_length, config.vocab_size)
(also checking the source code I came accross this:
outputs = (prediction_scores,) + outputs[2:] # Add hidden states and attention if they are here
From my understanding, I should get only one output, embedded, which should have the following shape: torch.Size([64, 1024, 50265]
. Instead, I am getting 2 Tensors, embedded and x, with the following shapes:
torch.Size([64, 1024, 768])
torch.Size([64, 768])
I have checked and both roberta.config.output_hidden_states
are false.
So my questions are why am I getting 2 outputs, why do they look like that and what do they represent?