Questions about the Transformers NER usage example

lazy_like_a_fox · May 24, 2020, 6:50am

I’m just learn Pytorch and was following the NER example here https://huggingface.co/transformers/usage.html#named-entity-recognition that produces this output:

[
    {'word': 'Hu', 'score': 0.9995632767677307, 'entity': 'I-ORG'},
    {'word': '##gging', 'score': 0.9915938973426819, 'entity': 'I-ORG'},
    {'word': 'Face', 'score': 0.9982671737670898, 'entity': 'I-ORG'},
    {'word': 'Inc', 'score': 0.9994403719902039, 'entity': 'I-ORG'},
    {'word': 'New', 'score': 0.9994346499443054, 'entity': 'I-LOC'},
    {'word': 'York', 'score': 0.9993270635604858, 'entity': 'I-LOC'},
    {'word': 'City', 'score': 0.9993864893913269, 'entity': 'I-LOC'},
    {'word': 'D', 'score': 0.9825621843338013, 'entity': 'I-LOC'},
    {'word': '##UM', 'score': 0.936983048915863, 'entity': 'I-LOC'},
    {'word': '##BO', 'score': 0.8987102508544922, 'entity': 'I-LOC'},
    {'word': 'Manhattan', 'score': 0.9758241176605225, 'entity': 'I-LOC'},
    {'word': 'Bridge', 'score': 0.990249514579773, 'entity': 'I-LOC'}
]

Shouldn’t the first entity be labeled ‘B-ORG’?
Do I need to write my own function to stitch split words like ‘Hugging’ back together or is there something prebuilt for that?

ptrblck · May 25, 2020, 2:57am

This question seems to be specific to the documentation of Huggingface’s Transformers.
CC @Thomas_Wolf to answer this specific question or to refer to some documentation.