- NER tagging = identifying the named entities in a natural language passage
- “Georg Wilhelm Friedrich Hegel”
- “10 kilometers”
- “January”
- “The Soviet Union”
- Obviously an essential step in many tasks involving web scrapes and other bulk unstructured text
- Today, somewhat more niche to do as a stand-alone task
- LLMs can accomplish both the NER task and whatever downstream task the NER was supporting often zero-shot
- Stand-alone approaches still useful for:
- State of the art is fine-tuned LLMs, typically BERT variants
- Commonly accessed via spaCy
- As with POS tagging (which often precedes NER, especially in older techniques):
- Started out with manual construction of grammar-based rules, from which syntax trees could be constructed
- Progressed to HMMs in the 1980s
- Eventually RNNs
- Now Transformers like BERT