Metrics#

Here we describe the available metrics in Argilla:

  • Common metrics: Metrics available for all datasets

  • Text classification: Metrics for text classification

  • Token classification: Metrics for token classification

Common metrics#

argilla.metrics.commons.keywords(name, query=None, size=20)#

Computes the keywords occurrence distribution in dataset

Parameters
Returns

The dataset keywords occurrence distribution

Return type

argilla.metrics.models.MetricSummary

Examples

>>> from argilla.metrics.commons import keywords
>>> summary = keywords(name="example-dataset")
>>> summary.visualize() # will plot an histogram with results
>>> summary.data # returns the raw result data
argilla.metrics.commons.records_status(name, query=None)#

Computes the records status distribution for a dataset

Parameters
Returns

The status distribution metric summary

Return type

argilla.metrics.models.MetricSummary

Examples

>>> from argilla.metrics.commons import records_status
>>> summary = records_status(name="example-dataset")
>>> summary.visualize() # will plot an histogram with results
>>> summary.data # returns the raw result data
argilla.metrics.commons.text_length(name, query=None)#

Computes the input text length metrics for a dataset

Parameters
Returns

The text length metric summary

Return type

argilla.metrics.models.MetricSummary

Examples

>>> from argilla.metrics.commons import text_length
>>> summary = text_length(name="example-dataset")
>>> summary.visualize() # will plot an histogram with results
>>> summary.data # returns the raw result data

Text classification#

argilla.metrics.text_classification.metrics.f1(name, query=None)#

Computes the single label f1 metric for a dataset

Parameters
Returns

The f1 metric summary

Return type

argilla.metrics.models.MetricSummary

Examples

>>> from argilla.metrics.text_classification import f1
>>> summary = f1(name="example-dataset")
>>> summary.visualize() # will plot a bar chart with results
>>> summary.data # returns the raw result data
argilla.metrics.text_classification.metrics.f1_multilabel(name, query=None)#

Computes the multi-label label f1 metric for a dataset

Parameters
Returns

The f1 metric summary

Return type

argilla.metrics.models.MetricSummary

Examples

>>> from argilla.metrics.text_classification import f1_multilabel
>>> summary = f1_multilabel(name="example-dataset")
>>> summary.visualize() # will plot a bar chart with results
>>> summary.data # returns the raw result data

Token classification#

class argilla.metrics.token_classification.metrics.ComputeFor(value)#

An enumeration.

argilla.metrics.token_classification.metrics.entity_capitalness(name, query=None, compute_for=ComputeFor.PREDICTIONS)#

Computes the entity capitalness. The entity capitalness splits the entity mention shape in 4 groups:

UPPER: All characters in entity mention are upper case.

LOWER: All characters in entity mention are lower case.

FIRST: The first character in the mention is upper case.

MIDDLE: First character in the mention is lower case and at least one other character is upper case.

Parameters
Returns

The summary entity capitalness distribution

Return type

argilla.metrics.models.MetricSummary

Examples

>>> from argilla.metrics.token_classification import entity_capitalness
>>> summary = entity_capitalness(name="example-dataset")
>>> summary.visualize()
argilla.metrics.token_classification.metrics.entity_density(name, query=None, compute_for=ComputeFor.PREDICTIONS, interval=0.005)#

Computes the entity density distribution. Then entity density is calculated at record level for each mention as mention_length/tokens_length

Parameters
  • name (str) โ€“ The dataset name.

  • query (Optional[str]) โ€“

    An ElasticSearch query with the query string syntax

  • compute_for (Union[str, argilla.metrics.token_classification.metrics.ComputeFor]) โ€“ Metric can be computed for annotations or predictions. Accepted values are Annotations and Predictions. Default to Predictions.

  • interval (float) โ€“ The interval for histogram. The entity density is defined in the range 0-1.

Returns

The summary entity density distribution

Return type

argilla.metrics.models.MetricSummary

Examples

>>> from argilla.metrics.token_classification import entity_density
>>> summary = entity_density(name="example-dataset")
>>> summary.visualize()
argilla.metrics.token_classification.metrics.entity_labels(name, query=None, compute_for=ComputeFor.PREDICTIONS, labels=50)#

Computes the entity labels distribution

Parameters
  • name (str) โ€“ The dataset name.

  • query (Optional[str]) โ€“

    An ElasticSearch query with the query string syntax

  • compute_for (Union[str, argilla.metrics.token_classification.metrics.ComputeFor]) โ€“ Metric can be computed for annotations or predictions. Accepted values are Annotations and Predictions. Default to Predictions

  • labels (int) โ€“ The number of top entities to retrieve. Lower numbers will be better performants

Returns

The summary for entity tags distribution

Return type

argilla.metrics.models.MetricSummary

Examples

>>> from argilla.metrics.token_classification import entity_labels
>>> summary = entity_labels(name="example-dataset", labels=20)
>>> summary.visualize() # will plot a bar chart with results
>>> summary.data # The top-20 entity tags
argilla.metrics.token_classification.metrics.f1(name, query=None)#

Computes F1 metrics for a dataset based on entity-level.

Parameters
  • name (str) โ€“ The dataset name.

  • query (Optional[str]) โ€“

    An ElasticSearch query with the query string syntax

Returns

The F1 metric summary containing precision, recall and the F1 score (averaged and per label).

Return type

argilla.metrics.models.MetricSummary

Examples

>>> from argilla.metrics.token_classification import f1
>>> summary = f1(name="example-dataset")
>>> summary.visualize() # will plot three bar charts with the results
>>> summary.data # returns the raw result data

To display the results as a table:

>>> import pandas as pd
>>> pd.DataFrame(summary.data.values(), index=summary.data.keys())
argilla.metrics.token_classification.metrics.mention_length(name, query=None, level='token', compute_for=ComputeFor.PREDICTIONS, interval=1)#

Computes mentions length distribution (in number of tokens).

Parameters
  • name (str) โ€“ The dataset name.

  • query (Optional[str]) โ€“

    An ElasticSearch query with the query string syntax

  • level (str) โ€“ The mention length level. Accepted values are โ€œtokenโ€ and โ€œcharโ€

  • compute_for (Union[str, argilla.metrics.token_classification.metrics.ComputeFor]) โ€“ Metric can be computed for annotations or predictions. Accepted values are Annotations and Predictions. Defaults to Predictions.

  • interval (int) โ€“ The bins or bucket for result histogram

Returns

The summary for mention token distribution

Return type

argilla.metrics.models.MetricSummary

Examples

>>> from argilla.metrics.token_classification import mention_length
>>> summary = mention_length(name="example-dataset", interval=2)
>>> summary.visualize() # will plot a histogram chart with results
>>> summary.data # the raw histogram data with bins of size 2
argilla.metrics.token_classification.metrics.token_capitalness(name, query=None)#

Computes the token capitalness distribution

UPPER: All characters in the token are upper case.

LOWER: All characters in the token are lower case.

FIRST: The first character in the token is upper case.

MIDDLE: First character in the token is lower case and at least one other character is upper case.

Parameters
  • name (str) โ€“ The dataset name.

  • query (Optional[str]) โ€“

    An ElasticSearch query with the query string syntax

Returns

The summary for token length distribution

Return type

argilla.metrics.models.MetricSummary

Examples

>>> from argilla.metrics.token_classification import token_capitalness
>>> summary = token_capitalness(name="example-dataset")
>>> summary.visualize() # will plot a histogram with results
>>> summary.data # The token capitalness distribution
argilla.metrics.token_classification.metrics.token_frequency(name, query=None, tokens=1000)#

Computes the token frequency distribution for a numbe of tokens.

Parameters
  • name (str) โ€“ The dataset name.

  • query (Optional[str]) โ€“

    An ElasticSearch query with the query string syntax

  • tokens (int) โ€“ The top-k number of tokens to retrieve

Returns

The summary for token frequency distribution

Return type

argilla.metrics.models.MetricSummary

Examples

>>> from argilla.metrics.token_classification import token_frequency
>>> summary = token_frequency(name="example-dataset", token=50)
>>> summary.visualize() # will plot a histogram with results
>>> summary.data # the top-50 tokens frequency
argilla.metrics.token_classification.metrics.token_length(name, query=None)#

Computes the token size distribution in terms of number of characters

Parameters
  • name (str) โ€“ The dataset name.

  • query (Optional[str]) โ€“

    An ElasticSearch query with the query string syntax

Returns

The summary for token length distribution

Return type

argilla.metrics.models.MetricSummary

Examples

>>> from argilla.metrics.token_classification import token_length
>>> summary = token_length(name="example-dataset")
>>> summary.visualize() # will plot a histogram with results
>>> summary.data # The token length distribution
argilla.metrics.token_classification.metrics.tokens_length(name, query=None, interval=1)#

Computes the text length distribution measured in number of tokens.

Parameters
  • name (str) โ€“ The dataset name.

  • query (Optional[str]) โ€“

    An ElasticSearch query with the query string syntax

  • interval (int) โ€“ The bins or bucket for result histogram

Returns

The summary for token distribution

Return type

argilla.metrics.models.MetricSummary

Examples

>>> from argilla.metrics.token_classification import tokens_length
>>> summary = tokens_length(name="example-dataset", interval=5)
>>> summary.visualize() # will plot a histogram with results
>>> summary.data # the raw histogram data with bins of size 5
argilla.metrics.token_classification.metrics.top_k_mentions(name, query=None, compute_for=ComputeFor.PREDICTIONS, k=100, threshold=2, post_label_filter=None)#

Computes the consistency for top k mentions in the dataset.

Entity consistency defines the label variability for a given mention. For example, a mention first identified in the whole dataset as Cardinal, Person and Time is less consistent than a mention Peter identified as Person in the dataset.

Parameters
  • name (str) โ€“ The dataset name.

  • query (Optional[str]) โ€“

    An ElasticSearch query with the query string syntax

  • compute_for (Union[str, argilla.metrics.token_classification.metrics.ComputeFor]) โ€“ Metric can be computed for annotations or predictions. Accepted values are Annotations and Predictions. Default to Predictions

  • k (int) โ€“ The number of mentions to retrieve.

  • threshold (int) โ€“ The entity variability threshold (must be greater or equal to 1).

  • post_label_filter (Optional[Set[str]]) โ€“ A set of labels used for filtering the results. This filter may affect to the expected

  • mentions (number of) โ€“

Returns

The summary top k mentions distribution

Examples

>>> from argilla.metrics.token_classification import top_k_mentions
>>> summary = top_k_mentions(name="example-dataset")
>>> summary.visualize()