๐Ÿ”ซ Zero-shot NER with Flair#

In this tutorial you will learn how to analyze and validate NER predictions from the new zero-shot model provided by the Flair NLP library with Argilla.

  • ๐Ÿ›  Useful for quickly bootstrapping a training set (using Argilla Annotation Mode) as well as integrating with weak-supervision workflows.

  • ๐Ÿ‘ We will use a challenging, exciting dataset: wnut_17 (more info below).

  • ๐Ÿ”ฎ You will be able to see and work with the obtained predictions.

labelling-tokenclassification-flair-fewshot

Introduction#

This tutorial will show you how to work with Named Entity Recognition (NER), Flair and Argilla. But, what is NER?

According to Analytics Vidhya, โ€œNER is a natural language processing technique that can automatically scan entire articles and pull out some fundamental entities in a text and classify them into predefined categoriesโ€. These entities can be names, quantities, dates and times, amounts of money/currencies, and much more.

On the other hand, Flair is a library which facilitates the application of NLP models to NER and other NLP techniques in many different languages. It is not only a powerful library, but also intuitive.

Thanks to these resources and the Annotation Mode of Argilla, we can quickly build up a data set to train a domain-specific model.

Setup#

For this tutorial we also need the third party libraries datasets and flair, which can be installed via pip:

[ ]:
%pip install "datasets~=2.6.0" "flair~=0.11.0" -qqq

1. Load the wnut_17 dataset#

In this example, weโ€™ll use a challenging NER dataset, the โ€œWNUT 17: Emerging and Rare entity recognitionโ€ , which focuses on unusual, previously-unseen entities in the context of emerging discussions. This dataset is useful for getting a sense of the quality of our zero-shot predictions.

Letโ€™s load the test set from the Hugging Face Hub:

[ ]:
from datasets import load_dataset

# download data set
dataset = load_dataset("wnut_17", split="test")

[ ]:
# define labels
labels = ["corporation", "creative-work", "group", "location", "person", "product"]

2. Configure Flair TARSTagger#

Now letโ€™s configure our NER model, following Flairโ€™s documentation:

[ ]:
from flair.models import TARSTagger

# load zero-shot NER tagger
tars = TARSTagger.load("tars-ner")

# define labels for named entities using wnut labels
tars.add_and_switch_to_new_task("task 1", labels, label_type="ner")

Letโ€™s test it with one example!

[ ]:
from flair.data import Sentence

# wrap our tokens in a flair Sentence
sentence = Sentence(" ".join(dataset[0]["tokens"]))

[ ]:
# add predictions to our sentence
tars.predict(sentence)

# extract predicted entities into a list of tuples (entity, start_char, end_char)
[
    (entity.get_labels()[0].value, entity.start_position, entity.end_position)
    for entity in sentence.get_spans("ner")
]

3. Predict over wnut_17 and log into argilla#

Now, letโ€™s log the predictions in Argilla:

[ ]:
import argilla as rg

# build records for the first 100 examples
records = []
for record in dataset.select(range(100)):
    input_text = " ".join(record["tokens"])

    sentence = Sentence(input_text)
    tars.predict(sentence)
    prediction = [
        (entity.get_labels()[0].value, entity.start_position, entity.end_position)
        for entity in sentence.get_spans("ner")
    ]

    # building TokenClassificationRecord
    records.append(
        rg.TokenClassificationRecord(
            text=input_text,
            tokens=[token.text for token in sentence],
            prediction=prediction,
            prediction_agent="tars-ner",
        )
    )

# log the records to Argilla
rg.log(records, name="tars_ner_wnut_17", metadata={"split": "test"})

Now you can see the results obtained! With the annotation mode, you can change, add, validate or discard your results. Statistics are also available, to better monitor your records!

Summary#

Getting predictions with a zero-shot approach can be very helpful to guide humans in their annotation process. Especially for NER tasks, Argilla makes it very easy to explore and correct those predictions thanks to its Annotation Mode ๐Ÿ˜Ž.

Next steps#

โญ Star Argilla Github repo to stay updated.

๐Ÿ“š Argilla documentation for more guides and tutorials.

๐Ÿ™‹โ€โ™€๏ธ Join the Argilla community! A good place to start is the discussion forum.