Open In Colab  View Notebook on GitHub

๐Ÿ”ซ Evaluate a zero-shot NER with Flair#

In this tutorial, you will learn how to analyze and validate NER predictions from the new zero-shot model provided by the Flair NLP library with Argilla.

  • ๐Ÿ›  Useful for quickly bootstrapping a training set (using Argilla Annotation Mode) as well as integrating with weak-supervision workflows.

  • ๐Ÿ‘ We will use a challenging, exciting dataset: wnut_17 (more info below).

  • ๐Ÿ”ฎ You will be able to see and work with the obtained predictions.



This tutorial will show you how to work with Named Entity Recognition (NER), Flair and Argilla. But, what is NER?

According to Analytics Vidhya, โ€œNER is a natural language processing technique that can automatically scan entire articles and pull out some fundamental entities in a text and classify them into predefined categoriesโ€. These entities can be names, quantities, dates and times, amounts of money/currencies, and much more.

On the other hand, Flair is a library which facilitates the application of NLP models to NER and other NLP techniques in many different languages. It is not only a powerful library, but also intuitive.

Thanks to these resources and the Annotation Mode of Argilla, we can quickly build up a data set to train a domain-specific model.

Running Argilla#

For this tutorial, you will need to have an Argilla server running. There are two main options for deploying and running Argilla:

Deploy Argilla on Hugging Face Spaces: If you want to run tutorials with external notebooks (e.g., Google Colab) and you have an account on Hugging Face, you can deploy Argilla on Spaces with a few clicks:

deploy on spaces

For details about configuring your deployment, check the official Hugging Face Hub guide.

Launch Argilla using Argillaโ€™s quickstart Docker image: This is the recommended option if you want Argilla running on your local machine. Note that this option will only let you run the tutorial locally and not with an external notebook service.

For more information on deployment options, please check the Deployment section of the documentation.


This tutorial is a Jupyter Notebook. There are two options to run it:

  • Use the Open in Colab button at the top of this page. This option allows you to run the notebook directly on Google Colab. Donโ€™t forget to change the runtime type to GPU for faster model training and inference.

  • Download the .ipynb file by clicking on the View source link at the top of the page. This option allows you to download the notebook and run it on your local machine or on a Jupyter Notebook tool of your choice.


For this tutorial, youโ€™ll need to install the Argilla client and a few third party libraries using pip:

[ ]:
%pip install "argilla" "datasets~=2.6.0" "flair~=0.11.0" -qqq

Letโ€™s import the Argilla module for reading and writing data:

[ ]:
import argilla as rg

If you are running Argilla using the Docker quickstart image or Hugging Face Spaces, you need to init the Argilla client with the URL and API_KEY:

[ ]:
# Replace api_url with the url to your HF Spaces URL if using Spaces
# Replace api_key if you configured a custom API key
# Replace workspace with the name of your workspace

If youโ€™re running a private Hugging Face Space, you will also need to set the HF_TOKEN as follows:

[ ]:
# # Set the HF_TOKEN environment variable
# import os
# os.environ['HF_TOKEN'] = "your-hf-token"

# # Replace api_url with the url to your HF Spaces URL
# # Replace api_key if you configured a custom API key
# # Replace workspace with the name of your workspace
# rg.init(
#     api_url="https://[your-owner-name]-[your_space_name]",
#     api_key="admin.apikey",
#     workspace="admin",
#     extra_headers={"Authorization": f"Bearer {os.environ['HF_TOKEN']}"},
# )

Finally, letโ€™s include the imports we need:

[ ]:
from datasets import load_dataset
from flair.models import TARSTagger
from import Sentence

Enable Telemetry#

We gain valuable insights from how you interact with our tutorials. To improve ourselves in offering you the most suitable content, using the following lines of code will help us understand that this tutorial is serving you effectively. Though this is entirely anonymous, you can choose to skip this step if you prefer. For more info, please check out the Telemetry page.

[ ]:
    from argilla.utils.telemetry import tutorial_running
except ImportError:
    print("Telemetry is introduced in Argilla 1.20.0 and not found in the current installation. Skipping telemetry.")

1. Load the wnut_17 dataset#

In this example, weโ€™ll use a challenging NER dataset, the โ€œWNUT 17: Emerging and Rare entity recognitionโ€ , which focuses on unusual, previously unseen entities in the context of emerging discussions. This dataset is useful for getting a sense of the quality of our zero-shot predictions.

Letโ€™s load the test set from the Hugging Face Hub:

[ ]:
# Download data set
dataset = load_dataset("wnut_17", split="test")

# Define labels
labels = ["corporation", "creative-work", "group", "location", "person", "product"]

2. Configure Flair TARSTagger#

Now letโ€™s configure our NER model, following Flairโ€™s documentation:

[ ]:
# Load zero-shot NER tagger
tars = TARSTagger.load("tars-ner")

# Define labels for named entities using wnut labels
tars.add_and_switch_to_new_task("task 1", labels, label_type="ner")

Letโ€™s test it with one example!

[ ]:

# Wrap our tokens in a flair Sentence sentence = Sentence(" ".join(dataset[0]["tokens"])) # Add predictions to our sentence tars.predict(sentence) # Extract predicted entities into a list of tuples (entity, start_char, end_char) [ (entity.get_labels()[0].value, entity.start_position, entity.end_position) for entity in sentence.get_spans("ner") ]

3. Predict over wnut_17 and log into argilla#

Now, letโ€™s log the predictions in Argilla:

[ ]:
# Build records for the first 100 examples
records = []

for record in
    input_text = " ".join(record["tokens"])

    sentence = Sentence(input_text)
    prediction = [
        (entity.get_labels()[0].value, entity.start_position, entity.end_position)
        for entity in sentence.get_spans("ner")

    # Building TokenClassificationRecord
            tokens=[token.text for token in sentence],

# Log the records to Argilla
rg.log(records, name="tars_ner_wnut_17", metadata={"split": "test"})

Now you can see the results obtained! With the annotation mode, you can change, add, validate or discard your results. Statistics are also available, to better monitor your records!


Getting predictions with a zero-shot approach can be very helpful in guiding humans in their annotation process. Especially for NER tasks, Argilla makes it very easy to explore and correct those predictions thanks to its Hand-labeling Mode ๐Ÿ˜Ž.