Output of RoBERTa (huggingface transformers)

Romina_Baila · June 13, 2020, 12:34pm

Hello,
I am trying to better understand how RoBERTa model (from huggingface transformers) works.

My batch_size is 64

My roberta model looks like this

roberta = RobertaModel.from_pretrained(config['model'])
roberta.config.max_position_embeddings = config['max_input_length']
print(roberta.config)

RobertaConfig {
  "architectures": [
    "RobertaForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": "1024",
  "model_type": "roberta",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 1,
  "type_vocab_size": 1,
  "vocab_size": 50265
}

And in my model, in forward() method I have tried:

embedded , x = self.roberta(text)

Now, from what I read in the documentation and source code from huggingface, the output of self.roberta(text) should be

prediction_scores ( torch.FloatTensor of shape (batch_size, sequence_length, config.vocab_size) )

(also checking the source code I came accross this:
outputs = (prediction_scores,) + outputs[2:] # Add hidden states and attention if they are here)
From my understanding, I should get only one output, embedded, which should have the following shape: torch.Size([64, 1024, 50265]. Instead, I am getting 2 Tensors, embedded and x, with the following shapes:

torch.Size([64, 1024, 768])
torch.Size([64, 768])

I have checked and both roberta.config.output_hidden_states roberta.config.output_attentions are false.

So my questions are why am I getting 2 outputs, why do they look like that and what do they represent?

Krish · June 13, 2020, 4:43pm

The first one is basically the output of the last layer of the model (can be used for token classification).
The second one is the pooled output (can be used for sequence classification).