Client#

Here we describe the Python client of Argilla that we divide into four basic modules:

  • Methods: These methods make up the interface to interact with Argillaโ€™s REST API.

  • Records: You need to wrap your data in these Records for Argilla to understand it.

  • Datasets: Datasets: You can wrap your records around these Datasets for extra functionality.

  • FeedbackDataset: FeedbackDataset: the dataset format for FeedbackTask and LLM support.

Methods#

argilla.active_client()#

Returns the active argilla client.

If Active client is None, initialize a default one.

Return type:

Argilla

argilla.copy(dataset, name_of_copy, workspace=None)#

Creates a copy of a dataset including its tags and metadata

Parameters:
  • dataset (str) โ€“ Name of the source dataset

  • name_of_copy (str) โ€“ Name of the copied dataset

  • workspace (Optional[str]) โ€“ If provided, dataset will be copied to that workspace

Return type:

None

Examples

>>> import argilla as rg
>>> rg.copy("my_dataset", name_of_copy="new_dataset")
>>> rg.load("new_dataset")
argilla.delete(name, workspace=None)#

Deletes an Argilla dataset from the server. It can be used with both Dataset and FeedbackDataset, although for the latter itโ€™s recommended to use rg.FeedbackDataset.delete instead.

Parameters:
  • name (str) โ€“ The name of the dataset to delete.

  • workspace (Optional[str]) โ€“ The workspace to which the dataset belongs. If None (default) and the env variable ARGILLA_WORKSPACE is not set, it will default to the private user workspace.

Raises:
  • ValueError โ€“ If no dataset is found with the given name and workspace.

  • PermissionError โ€“ If the dataset thatโ€™s being deleted is a FeedbackDataset and the user doesnโ€™t have enough permissions to delete it.

  • RuntimeError โ€“ If the dataset thatโ€™s being deleted is a FeedbackDataset and some kind of error occurs during the deletion process.

Return type:

None

Examples

>>> import argilla as rg
>>> rg.delete(name="example-dataset")
argilla.delete_records(name, workspace=None, query=None, ids=None, discard_only=False, discard_when_forbidden=True)#

Delete records from a argilla dataset.

Parameters:
  • name (str) โ€“ The dataset name.

  • workspace (Optional[str]) โ€“ The workspace to which records will be logged/loaded. If None (default) and the env variable ARGILLA_WORKSPACE is not set, it will default to the private user workspace.

  • query (Optional[str]) โ€“ An ElasticSearch query with the query string syntax

  • ids (Optional[List[Union[str, int]]]) โ€“ If provided, deletes dataset records with given ids.

  • discard_only (bool) โ€“ If True, matched records wonโ€™t be deleted. Instead, they will be marked as Discarded

  • discard_when_forbidden (bool) โ€“ Only super-user or dataset creator can delete records from a dataset. So, running โ€œhardโ€ deletion for other users will raise an ForbiddenApiError error. If this parameter is True, the client API will automatically try to mark as Discarded records instead. Default, True

Returns:

The total of matched records and real number of processed errors. These numbers could not be the same if some data conflicts are found during operations (some matched records change during deletion).

Return type:

Tuple[int, int]

Examples

>>> ## Delete by id
>>> import argilla as rg
>>> rg.delete_records(name="example-dataset", ids=[1,3,5])
>>> ## Discard records by query
>>> import argilla as rg
>>> rg.delete_records(name="example-dataset", query="metadata.code=33", discard_only=True)
argilla.get_workspace()#

Returns the name of the active workspace.

Returns:

The name of the active workspace as a string.

Return type:

str

argilla.init(api_url=None, api_key=None, workspace=None, timeout=60, extra_headers=None, httpx_extra_kwargs=None)#

Init the Python client.

If this function is called with api_url=None and api_key=None and no values have been set for the environment variables ARGILLA_API_URL and ARGILLA_API_KEY, then the local credentials stored by a previous call to argilla login command will be used. If local credentials are not found, then api_url and api_key will fallback to the default values.

Parameters:
  • api_url (Optional[str]) โ€“ Address of the REST API. If None (default) and the env variable ARGILLA_API_URL is not set, it will default to http://localhost:6900.

  • api_key (Optional[str]) โ€“ Authentication key for the REST API. If None (default) and the env variable ARGILLA_API_KEY is not set, it will default to argilla.apikey.

  • workspace (Optional[str]) โ€“ The workspace to which records will be logged/loaded. If None (default) and the env variable ARGILLA_WORKSPACE is not set, it will default to the private user workspace.

  • timeout (int) โ€“ Wait timeout seconds for the connection to timeout. Default: 60.

  • extra_headers (Optional[Dict[str, str]]) โ€“ Extra HTTP headers sent to the server. You can use this to customize the headers of argilla client requests, like additional security restrictions. Default: None.

  • httpx_extra_kwargs (Optional[Dict[str, Any]]) โ€“ Extra kwargs passed to the httpx.Client constructor. For more information about the available arguments, see https://www.python-httpx.org/api/#client. Defaults to None.

Return type:

None

Examples

>>> import argilla as rg
>>>
>>> rg.init(api_url="http://localhost:9090", api_key="4AkeAPIk3Y")
>>> # Customizing request headers
>>> headers = {"X-Client-id":"id","X-Secret":"secret"}
>>> rg.init(api_url="http://localhost:9090", api_key="4AkeAPIk3Y", extra_headers=headers)
argilla.load(name, workspace=None, query=None, vector=None, ids=None, limit=None, sort=None, id_from=None, batch_size=250, include_vectors=True, include_metrics=True, as_pandas=None)#

Loads a argilla dataset.

Parameters:
  • name (str) โ€“ The dataset name.

  • workspace (Optional[str]) โ€“ The workspace to which records will be logged/loaded. If None (default) and the env variable ARGILLA_WORKSPACE is not set, it will default to the private user workspace.

  • query (Optional[str]) โ€“

    An ElasticSearch query with the query string syntax

  • vector (Optional[Tuple[str, List[float]]]) โ€“ Vector configuration for a semantic search

  • ids (Optional[List[Union[str, int]]]) โ€“ If provided, load dataset records with given ids.

  • limit (Optional[int]) โ€“ The number of records to retrieve.

  • sort (Optional[List[Tuple[str, str]]]) โ€“ The fields on which to sort [(<field_name>, โ€˜asc|decsโ€™)].

  • id_from (Optional[str]) โ€“ If provided, starts gathering the records starting from that Record. As the Records returned with the load method are sorted by ID, id_from can be used to load using batches.

  • batch_size (int) โ€“ If provided, load batch_size samples per request. A lower batch size may help avoid timeouts.

  • include_vectors (bool) โ€“ When set to False, indicates that records will be retrieved excluding their vectors, if any. By default, this parameter is set to True, meaning that vectors will be included.

  • include_metrics (bool) โ€“ When set to False, indicates that records will be retrieved excluding their metrics. By default, this parameter is set to True, meaning that metrics will be included.

  • as_pandas (Optional[bool]) โ€“ DEPRECATED! To get a pandas DataFrame do rg.load('my_dataset').to_pandas().

Returns:

A argilla dataset.

Return type:

Union[DatasetForTextClassification, DatasetForTokenClassification, DatasetForText2Text, RemoteFeedbackDataset]

Examples

Basic Loading: load the samples sorted by their ID

>>> import argilla as rg
>>> dataset = rg.load(name="example-dataset")
Iterate over a large dataset:

When dealing with a large dataset you might want to load it in batches to optimize memory consumption and avoid network timeouts. To that end, a simple batch-iteration over the whole database can be done employing the from_id parameter. This parameter will act as a delimiter, retrieving the N items after the given id, where N is determined by the limit parameter. NOTE If no limit is given the whole dataset after that ID will be retrieved.

>>> import argilla as rg
>>> dataset_batch_1 = rg.load(name="example-dataset", limit=1000)
>>> dataset_batch_2 = rg.load(name="example-dataset", limit=1000, id_from=dataset_batch_1[-1].id)
argilla.log(records, name, workspace=None, tags=None, metadata=None, batch_size=100, verbose=True, background=False, chunk_size=None, num_threads=0, max_retries=3)#

Logs Records to argilla.

The logging happens asynchronously in a background thread.

Parameters:
  • records (Union[TextClassificationRecord, TokenClassificationRecord, Text2TextRecord, TextGenerationRecord, Iterable[Union[TextClassificationRecord, TokenClassificationRecord, Text2TextRecord, TextGenerationRecord]], DatasetForTextClassification, DatasetForTokenClassification, DatasetForText2Text]) โ€“ The record, an iterable of records, or a dataset to log.

  • name (str) โ€“ The dataset name.

  • workspace (Optional[str]) โ€“ The workspace to which records will be logged/loaded. If None (default) and the env variable ARGILLA_WORKSPACE is not set, it will default to the private user workspace.

  • tags (Optional[Dict[str, str]]) โ€“ A dictionary of tags related to the dataset.

  • metadata (Optional[Dict[str, Any]]) โ€“ A dictionary of extra info for the dataset.

  • batch_size (int) โ€“ The batch size for a data bulk.

  • verbose (bool) โ€“ If True, shows a progress bar and prints out a quick summary at the end.

  • background (bool) โ€“ If True, we will NOT wait for the logging process to finish and return an asyncio.Future object. You probably want to set verbose to False in that case.

  • chunk_size (Optional[int]) โ€“ DEPRECATED! Use batch_size instead.

  • num_threads (int) โ€“ If > 0, will use num_thread separate number threads to batches, sending data concurrently. Default to 0, which means no threading at all.

  • max_retries (int) โ€“ Number of retries when logging a batch of records if a httpx.TransportError occurs. Default 3.

Returns:

Summary of the response from the REST API. If the background argument is set to True, an asyncio.Future will be returned instead.

Return type:

Union[BulkResponse, Future]

Examples

>>> import argilla as rg
>>> record = rg.TextClassificationRecord(
...     text="my first argilla example",
...     prediction=[('spam', 0.8), ('ham', 0.2)]
... )
>>> rg.log(record, name="example-dataset")
1 records logged to http://localhost:6900/datasets/argilla/example-dataset
BulkResponse(dataset='example-dataset', processed=1, failed=0)
>>>
>>> # Logging records in the background
>>> rg.log(record, name="example-dataset", background=True, verbose=False)
<Future at 0x7f675a1fffa0 state=pending>
argilla.set_workspace(workspace)#

Sets the active workspace.

Parameters:

workspace (str) โ€“ The new workspace

Return type:

None

Records#

This module contains the data models for the interface

class argilla.client.models.Framework(value)#

Frameworks supported by Argilla

Options:

transformers: Transformers peft: PEFT Transformers library setfit: SetFit Transformers library spacy: Spacy Explosion spacy-transformers: Spacy Transformers Explosion library span_marker: SpanMarker Tom Aarsen library spark-nlp: Spark NLP John Snow Labs library openai: OpenAI LLMs trl: Transformer Reinforcement Learning sentence-transformers: Sentence Transformers library

class argilla.client.models.Text2TextRecord(*, text, prediction=None, prediction_agent=None, annotation=None, annotation_agent=None, vectors=None, id=None, metadata=None, status=None, event_timestamp=None, metrics=None, search_keywords=None)#

Record for a text to text task

Parameters:
  • text (str) โ€“ The input of the record

  • prediction (Optional[List[Union[str, Tuple[str, float]]]]) โ€“ A list of strings or tuples containing predictions for the input text. If tuples, the first entry is the predicted text, the second entry is its corresponding score.

  • prediction_agent (Optional[str]) โ€“ Name of the prediction agent. By default, this is set to the hostname of your machine.

  • annotation (Optional[str]) โ€“ A string representing the expected output text for the given input text.

  • annotation_agent (Optional[str]) โ€“ Name of the prediction agent. By default, this is set to the hostname of your machine.

  • vectors (Optional[Dict[str, List[float]]]) โ€“ Embedding data mappings of the natural language text containing class attributesโ€™

  • id (Optional[Union[int, str]]) โ€“ The id of the record. By default (None), we will generate a unique ID for you.

  • metadata (Optional[Dict[str, Any]]) โ€“ Metadata for the record. Defaults to {}.

  • status (Optional[str]) โ€“ The status of the record. Options: โ€˜Defaultโ€™, โ€˜Editedโ€™, โ€˜Discardedโ€™, โ€˜Validatedโ€™. If an annotation is provided, this defaults to โ€˜Validatedโ€™, otherwise โ€˜Defaultโ€™.

  • event_timestamp (Optional[datetime]) โ€“ The timestamp for the creation of the record. Defaults to datetime.datetime.now().

  • metrics (Optional[Dict[str, Any]]) โ€“ READ ONLY! Metrics at record level provided by the server when using rg.load. This attribute will be ignored when using rg.log.

  • search_keywords (Optional[List[str]]) โ€“ READ ONLY! Relevant record keywords/terms for provided query when using rg.load. This attribute will be ignored when using rg.log.

Examples

>>> import argilla as rg
>>> record = rg.Text2TextRecord(
...     text="My name is Sarah and I love my dog.",
...     prediction=["Je m'appelle Sarah et j'aime mon chien."],
...     vectors = {
...         "bert_base_uncased": [1.2, 2.3, 3.4, 5.2, 6.5],
...         "xlm_multilingual_uncased": [2.2, 5.3, 5.4, 3.2, 2.5]
...     }
... )
classmethod prediction_as_tuples(prediction)#

Preprocess the predictions and wraps them in a tuple if needed

Parameters:

prediction (Optional[List[Union[str, Tuple[str, float]]]]) โ€“

class argilla.client.models.TextClassificationRecord(*, text=None, inputs=None, prediction=None, prediction_agent=None, annotation=None, annotation_agent=None, vectors=None, multi_label=False, explanation=None, id=None, metadata=None, status=None, event_timestamp=None, metrics=None, search_keywords=None)#

Record for text classification

Parameters:
  • text (Optional[str]) โ€“ The input of the record. Provide either โ€˜textโ€™ or โ€˜inputsโ€™.

  • inputs (Optional[Union[str, List[str], Dict[str, Union[str, List[str]]]]]) โ€“ Various inputs of the record (see examples below). Provide either โ€˜textโ€™ or โ€˜inputsโ€™.

  • prediction (Optional[List[Tuple[str, float]]]) โ€“ A list of tuples containing the predictions for the record. The first entry of the tuple is the predicted label, the second entry is its corresponding score.

  • prediction_agent (Optional[str]) โ€“ Name of the prediction agent. By default, this is set to the hostname of your machine.

  • annotation (Optional[Union[str, List[str]]]) โ€“ A string or a list of strings (multilabel) corresponding to the annotation (gold label) for the record.

  • annotation_agent (Optional[str]) โ€“ Name of the prediction agent. By default, this is set to the hostname of your machine.

  • vectors (Optional[Dict[str, List[float]]]) โ€“ Vectors data mappings of the natural language text containing class attributes

  • multi_label (bool) โ€“ Is the prediction/annotation for a multi label classification task? Defaults to False.

  • explanation (Optional[Dict[str, List[TokenAttributions]]]) โ€“ A dictionary containing the attributions of each token to the prediction. The keys map the input of the record (see inputs) to the TokenAttributions.

  • id (Optional[Union[int, str]]) โ€“ The id of the record. By default (None), we will generate a unique ID for you.

  • metadata (Optional[Dict[str, Any]]) โ€“ Metadata for the record. Defaults to {}.

  • status (Optional[str]) โ€“ The status of the record. Options: โ€˜Defaultโ€™, โ€˜Editedโ€™, โ€˜Discardedโ€™, โ€˜Validatedโ€™. If an annotation is provided, this defaults to โ€˜Validatedโ€™, otherwise โ€˜Defaultโ€™.

  • event_timestamp (Optional[datetime]) โ€“ The timestamp for the creation of the record. Defaults to datetime.datetime.now().

  • metrics (Optional[Dict[str, Any]]) โ€“ READ ONLY! Metrics at record level provided by the server when using rg.load. This attribute will be ignored when using rg.log.

  • search_keywords (Optional[List[str]]) โ€“ READ ONLY! Relevant record keywords/terms for provided query when using rg.load. This attribute will be ignored when using rg.log.

Examples

>>> # Single text input
>>> import argilla as rg
>>> record = rg.TextClassificationRecord(
...     text="My first argilla example",
...     prediction=[('eng', 0.9), ('esp', 0.1)],
...     vectors = {
...         "english_bert_vector": [1.2, 2.3, 3.1, 3.3]
...     }
... )
>>>
>>> # Various inputs
>>> record = rg.TextClassificationRecord(
...     inputs={
...         "subject": "Has ganado 1 million!",
...         "body": "Por usar argilla te ha tocado este premio: <link>"
...     },
...     prediction=[('spam', 0.99), ('ham', 0.01)],
...     annotation="spam",
...     vectors = {
...                     "distilbert_uncased":  [1.13, 4.1, 6.3, 4.2, 9.1],
...                     "xlm_roberta_cased": [1.1, 2.1, 3.3, 4.2, 2.1],
...             }
...     )
class argilla.client.models.TextGenerationRecord(*, text, prediction=None, prediction_agent=None, annotation=None, annotation_agent=None, vectors=None, id=None, metadata=None, status=None, event_timestamp=None, metrics=None, search_keywords=None)#
Parameters:
  • text (str) โ€“

  • prediction (Optional[List[Union[str, Tuple[str, float]]]]) โ€“

  • prediction_agent (Optional[str]) โ€“

  • annotation (Optional[str]) โ€“

  • annotation_agent (Optional[str]) โ€“

  • vectors (Optional[Dict[str, List[float]]]) โ€“

  • id (Optional[Union[int, str]]) โ€“

  • metadata (Optional[Dict[str, Any]]) โ€“

  • status (Optional[str]) โ€“

  • event_timestamp (Optional[datetime]) โ€“

  • metrics (Optional[Dict[str, Any]]) โ€“

  • search_keywords (Optional[List[str]]) โ€“

class argilla.client.models.TokenAttributions(*, token, attributions=None)#

Attribution of the token to the predicted label.

In the argilla app this is only supported for TextClassificationRecord and the multi_label=False case.

Parameters:
  • token (str) โ€“ The input token.

  • attributions (Dict[str, float]) โ€“ A dictionary containing label-attribution pairs.

class argilla.client.models.TokenClassificationRecord(text=None, tokens=None, tags=None, *, prediction=None, prediction_agent=None, annotation=None, annotation_agent=None, vectors=None, id=None, metadata=None, status=None, event_timestamp=None, metrics=None, search_keywords=None)#

Record for a token classification task

Parameters:
  • text (Optional[str]) โ€“ The input of the record

  • tokens (Optional[Union[List[str], Tuple[str, ...]]]) โ€“ The tokenized input of the record. We use this to guide the annotation process and to cross-check the spans of your prediction/annotation.

  • prediction (Optional[List[Union[Tuple[str, int, int], Tuple[str, int, int, Optional[float]]]]]) โ€“ A list of tuples containing the predictions for the record. The first entry of the tuple is the name of predicted entity, the second and third entry correspond to the start and stop character index of the entity. The fourth entry is optional and corresponds to the score of the entity (a float number between 0 and 1).

  • prediction_agent (Optional[str]) โ€“ Name of the prediction agent. By default, this is set to the hostname of your machine.

  • annotation (Optional[List[Tuple[str, int, int]]]) โ€“ A list of tuples containing annotations (gold labels) for the record. The first entry of the tuple is the name of the entity, the second and third entry correspond to the start and stop char index of the entity.

  • annotation_agent (Optional[str]) โ€“ Name of the prediction agent. By default, this is set to the hostname of your machine.

  • vectors (Optional[Dict[str, List[float]]]) โ€“ Vector data mappings of the natural language text containing class attributesโ€™

  • id (Optional[Union[int, str]]) โ€“ The id of the record. By default (None), we will generate a unique ID for you.

  • metadata (Optional[Dict[str, Any]]) โ€“ Metadata for the record. Defaults to {}.

  • status (Optional[str]) โ€“ The status of the record. Options: โ€˜Defaultโ€™, โ€˜Editedโ€™, โ€˜Discardedโ€™, โ€˜Validatedโ€™. If an annotation is provided, this defaults to โ€˜Validatedโ€™, otherwise โ€˜Defaultโ€™.

  • event_timestamp (Optional[datetime]) โ€“ The timestamp for the creation of the record. Defaults to datetime.datetime.now().

  • metrics (Optional[Dict[str, Any]]) โ€“ READ ONLY! Metrics at record level provided by the server when using rg.load. This attribute will be ignored when using rg.log.

  • search_keywords (Optional[List[str]]) โ€“ READ ONLY! Relevant record keywords/terms for provided query when using rg.load. This attribute will be ignored when using rg.log.

  • tags (Optional[List[str]]) โ€“

Examples

>>> import argilla as rg
>>> record = rg.TokenClassificationRecord(
...     text = "Michael is a professor at Harvard",
...     tokens = ["Michael", "is", "a", "professor", "at", "Harvard"],
...     prediction = [('NAME', 0, 7), ('LOC', 26, 33)],
...     vectors = {
...            "bert_base_uncased": [3.2, 4.5, 5.6, 8.9]
...          }
... )
char_id2token_id(char_idx)#

DEPRECATED, please use the argilla.utisl.span_utils.SpanUtils.char_to_token_idx dict instead.

Parameters:

char_idx (int) โ€“

Return type:

Optional[int]

spans2iob(spans=None)#

DEPRECATED, please use the argilla.utils.SpanUtils.to_tags() method.

Parameters:

spans (Optional[List[Tuple[str, int, int]]]) โ€“

Return type:

Optional[List[str]]

token_span(token_idx)#

DEPRECATED, please use the argilla.utisl.span_utils.SpanUtils.token_to_char_idx dict instead.

Parameters:

token_idx (int) โ€“

Return type:

Tuple[int, int]

Datasets#

class argilla.client.datasets.DatasetForText2Text(records=None)#

This Dataset contains Text2TextRecord records.

It allows you to export/import records into/from different formats, loop over the records, and access them by index.

Parameters:

records (Optional[List[Text2TextRecord]]) โ€“ A list of `Text2TextRecord`s.

Raises:

WrongRecordTypeError โ€“ When the record type in the provided list does not correspond to the dataset type.

Examples

>>> # Import/export records:
>>> import argilla as rg
>>> dataset = rg.DatasetForText2Text.from_pandas(my_dataframe)
>>> dataset.to_datasets()
>>>
>>> # Passing in a list of records:
>>> records = [
...     rg.Text2TextRecord(text="example"),
...     rg.Text2TextRecord(text="another example"),
... ]
>>> dataset = rg.DatasetForText2Text(records)
>>> assert len(dataset) == 2
>>>
>>> # Looping over the dataset:
>>> for record in dataset:
...     print(record)
>>>
>>> # Indexing into the dataset:
>>> dataset[0]
... rg.Text2TextRecord(text="example"})
>>> dataset[0] = rg.Text2TextRecord(text="replaced example")
classmethod from_datasets(dataset, text=None, annotation=None, metadata=None, id=None)#

Imports records from a datasets.Dataset.

Columns that are not supported are ignored.

Parameters:
  • dataset (datasets.Dataset) โ€“ A datasets Dataset from which to import the records.

  • text (Optional[str]) โ€“ The field name used as record text. Default: None

  • annotation (Optional[str]) โ€“ The field name used as record annotation. Default: None

  • metadata (Optional[Union[str, List[str]]]) โ€“ The field name used as record metadata. Default: None

  • id (Optional[str]) โ€“

Returns:

The imported records in a argilla Dataset.

Return type:

DatasetForText2Text

Examples

>>> import datasets
>>> ds = datasets.Dataset.from_dict({
...     "text": ["my example"],
...     "prediction": [["mi ejemplo", "ejemplo mio"]]
... })
>>> # or
>>> ds = datasets.Dataset.from_dict({
...     "text": ["my example"],
...     "prediction": [[{"text": "mi ejemplo", "score": 0.9}]]
... })
>>> DatasetForText2Text.from_datasets(ds)
classmethod from_pandas(dataframe)#

Imports records from a pandas.DataFrame.

Columns that are not supported are ignored.

Parameters:

dataframe (DataFrame) โ€“ A pandas DataFrame from which to import the records.

Returns:

The imported records in a argilla Dataset.

Return type:

DatasetForText2Text

class argilla.client.datasets.DatasetForTextClassification(records=None)#

This Dataset contains TextClassificationRecord records.

It allows you to export/import records into/from different formats, loop over the records, and access them by index.

Parameters:

records (Optional[List[TextClassificationRecord]]) โ€“ A list of `TextClassificationRecord`s.

Raises:

WrongRecordTypeError โ€“ When the record type in the provided list does not correspond to the dataset type.

Examples

>>> # Import/export records:
>>> import argilla as rg
>>> dataset = rg.DatasetForTextClassification.from_pandas(my_dataframe)
>>> dataset.to_datasets()
>>>
>>> # Looping over the dataset:
>>> for record in dataset:
...     print(record)
>>>
>>> # Passing in a list of records:
>>> records = [
...     rg.TextClassificationRecord(text="example"),
...     rg.TextClassificationRecord(text="another example"),
... ]
>>> dataset = rg.DatasetForTextClassification(records)
>>> assert len(dataset) == 2
>>>
>>> # Indexing into the dataset:
>>> dataset[0]
... rg.TextClassificationRecord(text="example")
>>> dataset[0] = rg.TextClassificationRecord(text="replaced example")
classmethod from_datasets(dataset, text=None, id=None, inputs=None, annotation=None, metadata=None)#

Imports records from a datasets.Dataset.

Columns that are not supported are ignored.

Parameters:
  • dataset (datasets.Dataset) โ€“ A datasets Dataset from which to import the records.

  • text (Optional[str]) โ€“ The field name used as record text. Default: None

  • id (Optional[str]) โ€“ The field name used as record id. Default: None

  • inputs (Optional[Union[str, List[str]]]) โ€“ A list of field names used for record inputs. Default: None

  • annotation (Optional[str]) โ€“ The field name used as record annotation. Default: None

  • metadata (Optional[Union[str, List[str]]]) โ€“ The field name used as record metadata. Default: None

Returns:

The imported records in a argilla Dataset.

Return type:

DatasetForTextClassification

Examples

>>> import datasets
>>> ds = datasets.Dataset.from_dict({
...     "inputs": ["example"],
...     "prediction": [
...         [{"label": "LABEL1", "score": 0.9}, {"label": "LABEL2", "score": 0.1}]
...     ]
... })
>>> DatasetForTextClassification.from_datasets(ds)
classmethod from_pandas(dataframe)#

Imports records from a pandas.DataFrame.

Columns that are not supported are ignored.

Parameters:

dataframe (DataFrame) โ€“ A pandas DataFrame from which to import the records.

Returns:

The imported records in a argilla Dataset.

Return type:

DatasetForTextClassification

class argilla.client.datasets.DatasetForTokenClassification(records=None)#

This Dataset contains TokenClassificationRecord records.

It allows you to export/import records into/from different formats, loop over the records, and access them by index.

Parameters:

records (Optional[List[TokenClassificationRecord]]) โ€“ A list of `TokenClassificationRecord`s.

Raises:

WrongRecordTypeError โ€“ When the record type in the provided list does not correspond to the dataset type.

Examples

>>> # Import/export records:
>>> import argilla as rg
>>> dataset = rg.DatasetForTokenClassification.from_pandas(my_dataframe)
>>> dataset.to_datasets()
>>>
>>> # Looping over the dataset:
>>> assert len(dataset) == 2
>>> for record in dataset:
...     print(record)
>>>
>>> # Passing in a list of records:
>>> import argilla as rg
>>> records = [
...     rg.TokenClassificationRecord(text="example", tokens=["example"]),
...     rg.TokenClassificationRecord(text="another example", tokens=["another", "example"]),
... ]
>>> dataset = rg.DatasetForTokenClassification(records)
>>>
>>> # Indexing into the dataset:
>>> dataset[0]
... rg.TokenClassificationRecord(text="example", tokens=["example"])
>>> dataset[0] = rg.TokenClassificationRecord(text="replace example", tokens=["replace", "example"])
classmethod from_datasets(dataset, text=None, id=None, tokens=None, tags=None, metadata=None)#

Imports records from a datasets.Dataset.

Columns that are not supported are ignored.

Parameters:
  • dataset (datasets.Dataset) โ€“ A datasets Dataset from which to import the records.

  • text (Optional[str]) โ€“ The field name used as record text. Default: None

  • id (Optional[str]) โ€“ The field name used as record id. Default: None

  • tokens (Optional[str]) โ€“ The field name used as record tokens. Default: None

  • tags (Optional[str]) โ€“ The field name used as record tags. Default: None

  • metadata (Optional[Union[str, List[str]]]) โ€“ The field name used as record metadata. Default: None

Returns:

The imported records in a argilla Dataset.

Return type:

DatasetForTokenClassification

Examples

>>> import datasets
>>> ds = datasets.Dataset.from_dict({
...     "text": ["my example"],
...     "tokens": [["my", "example"]],
...     "prediction": [
...         [{"label": "LABEL1", "start": 3, "end": 10, "score": 1.0}]
...     ]
... })
>>> DatasetForTokenClassification.from_datasets(ds)
classmethod from_pandas(dataframe)#

Imports records from a pandas.DataFrame.

Columns that are not supported are ignored.

Parameters:

dataframe (DataFrame) โ€“ A pandas DataFrame from which to import the records.

Returns:

The imported records in a argilla Dataset.

Return type:

DatasetForTokenClassification

argilla.client.datasets.read_datasets(dataset, task, **kwargs)#

Reads a datasets Dataset and returns a argilla Dataset

Columns not supported by the Record instance corresponding with the task are ignored.

Parameters:
  • dataset (datasets.Dataset) โ€“ Dataset to be read in.

  • task (Union[str, TaskType]) โ€“ Task for the dataset, one of: [โ€œTextClassificationโ€, โ€œTokenClassificationโ€, โ€œText2Textโ€].

  • **kwargs โ€“ Passed on to the task-specific DatasetFor*.from_datasets() method.

Returns:

A argilla dataset for the given task.

Return type:

Union[DatasetForTextClassification, DatasetForTokenClassification, DatasetForText2Text]

Examples

>>> # Read text classification records from a datasets Dataset
>>> import datasets
>>> ds = datasets.Dataset.from_dict({
...     "inputs": ["example"],
...     "prediction": [
...         [{"label": "LABEL1", "score": 0.9}, {"label": "LABEL2", "score": 0.1}]
...     ]
... })
>>> read_datasets(ds, task="TextClassification")
>>>
>>> # Read token classification records from a datasets Dataset
>>> ds = datasets.Dataset.from_dict({
...     "text": ["my example"],
...     "tokens": [["my", "example"]],
...     "prediction": [
...         [{"label": "LABEL1", "start": 3, "end": 10}]
...     ]
... })
>>> read_datasets(ds, task="TokenClassification")
>>>
>>> # Read text2text records from a datasets Dataset
>>> ds = datasets.Dataset.from_dict({
...     "text": ["my example"],
...     "prediction": [["mi ejemplo", "ejemplo mio"]]
... })
>>> # or
>>> ds = datasets.Dataset.from_dict({
...     "text": ["my example"],
...     "prediction": [[{"text": "mi ejemplo", "score": 0.9}]]
... })
>>> read_datasets(ds, task="Text2Text")
argilla.client.datasets.read_pandas(dataframe, task)#

Reads a pandas DataFrame and returns a argilla Dataset

Columns not supported by the Record instance corresponding with the task are ignored.

Parameters:
  • dataframe (DataFrame) โ€“ Dataframe to be read in.

  • task (Union[str, TaskType]) โ€“ Task for the dataset, one of: [โ€œTextClassificationโ€, โ€œTokenClassificationโ€, โ€œText2Textโ€]

Returns:

A argilla dataset for the given task.

Return type:

Union[DatasetForTextClassification, DatasetForTokenClassification, DatasetForText2Text]

Examples

>>> # Read text classification records from a pandas DataFrame
>>> import pandas as pd
>>> df = pd.DataFrame({
...     "inputs": ["example"],
...     "prediction": [
...         [("LABEL1", 0.9), ("LABEL2", 0.1)]
...     ]
... })
>>> read_pandas(df, task="TextClassification")
>>>
>>> # Read token classification records from a datasets Dataset
>>> df = pd.DataFrame({
...     "text": ["my example"],
...     "tokens": [["my", "example"]],
...     "prediction": [
...         [("LABEL1", 3, 10)]
...     ]
... })
>>> read_pandas(df, task="TokenClassification")
>>>
>>> # Read text2text records from a datasets Dataset
>>> df = pd.DataFrame({
...     "text": ["my example"],
...     "prediction": [["mi ejemplo", "ejemplo mio"]]
... })
>>> # or
>>> ds = pd.DataFrame({
...     "text": ["my example"],
...     "prediction": [[("mi ejemplo", 0.9)]]
... })
>>> read_pandas(df, task="Text2Text")

FeedbackDataset#

class argilla.client.feedback.dataset.local.dataset.FeedbackDataset(*, fields, questions, metadata_properties=None, vectors_settings=None, guidelines=None, allow_extra_metadata=True)#
Parameters:
  • fields (List[AllowedFieldTypes]) โ€“

  • questions (List[AllowedQuestionTypes]) โ€“

  • metadata_properties (Optional[List[AllowedMetadataPropertyTypes]]) โ€“

  • vectors_settings (Optional[List[VectorSettings]]) โ€“

  • guidelines (Optional[str]) โ€“

  • allow_extra_metadata (bool) โ€“

add_metadata_property(metadata_property)#

Adds the given metadata property to the dataset.

Parameters:

metadata_property (AllowedMetadataPropertyTypes) โ€“ the metadata property to add.

Returns:

The metadata property that was added.

Raises:
  • TypeError โ€“ if metadata_property is not a MetadataPropertySchema.

  • ValueError โ€“ if metadata_property is already in the dataset.

Return type:

AllowedMetadataPropertyTypes

add_records(records)#

Adds the given records to the dataset, and stores them locally. If you are planning to push those to Argilla, you will need to call push_to_argilla afterwards, to both create the dataset in Argilla and push the records to it. Then, from a FeedbackDataset pushed to Argilla, youโ€™ll just need to call add_records and those will be automatically uploaded to Argilla.

Parameters:

records (Union[FeedbackRecord, Dict[str, Any], List[Union[FeedbackRecord, Dict[str, Any]]]]) โ€“ can be a single FeedbackRecord, a list of FeedbackRecord, a single dictionary, or a list of dictionaries. If a dictionary is provided, it will be converted to a FeedbackRecord internally.

Raises:
  • ValueError โ€“ if the given records are an empty list.

  • ValueError โ€“ if the given records are neither: FeedbackRecord, list of FeedbackRecord, list of dictionaries as a record or dictionary as a record.

  • ValueError โ€“ if the given records do not match the expected schema.

Return type:

None

add_vector_settings(vector_settings)#

Adds a new vector_settings to the current FeedbackDataset.

Parameters:

vector_settings (VectorSettings) โ€“

Return type:

VectorSettings

property allow_extra_metadata: bool#

Returns whether if adding extra metadata to the records of the dataset is allowed

compute_agreement_metrics(metric_names=None, question_name=None)#

Compute agreement or reliability of annotation metrics.

This metrics can be used to determine the level of agreement across our annotation team, or whether the guidelines are clear enough for example.

Parameters:
Return type:

Union[AgreementMetricResult, List[AgreementMetricResult]]

Note

Currently, TextQuestion is not supported.

Returns:

Agreement metrics result or a list of metrics results if a list of metric

names is provided.

Return type:

metrics_result

Parameters:
compute_model_metrics(metric_names=None, question_name=None, strategy=None)#

Compute metrics for the annotators using the suggestions as the ground truth, and the responses as the predicted value, or if a strategy is provided, the same but applied to unified responses.

The metric interpretation is the same whether the responses are unified or not.

Parameters:
  • metric_names (Union[str, List[str]]) โ€“ Metric name or list of metric names of the metrics, dependent on the question type.

  • question_name (Union[str, LabelQuestion, MultiLabelQuestion, RatingQuestion, TextQuestion, RankingQuestion]) โ€“ Question for which we want to compute the metrics.

  • strategy (Optional[Union[str, LabelQuestionStrategy, MultiLabelQuestion, RatingQuestionStrategy, RankingQuestion]]) โ€“ Unification strategy. If given, will unify the responses of the dataset and compute the metrics on the unified responses vs the suggestions instead on a per user level. See unified_responses method for more information. Defaults to None.

Return type:

Union[Dict[str, List[ModelMetricResult]], ModelMetricResult, List[ModelMetricResult]]

Note

Currently, the following types of questions are supported: - For annotator level questions: all the types of questions - For unified responses: all the questions except the TextQuestion.

Returns:

If strategy is provided it will unify the annotations and return

the metrics for the unified responses. Otherwise, it will return the metrics for each annotator as a dict, where the key corresponds to the annotator id and the values are a list with the metrics.

Return type:

metrics_container

Parameters:
compute_unified_responses(question, strategy)#

The compute_unified_responses function takes a question and a strategy as input and applies the strategy to unify the responses for that question.

Parameters:
  • the (question The question parameter can be either a string representing the name of) โ€“ question, or an instance of one of the question classes (LabelQuestion, MultiLabelQuestion, RatingQuestion, RankingQuestion).

  • unifying (strategy The strategy parameter is used to specify the strategy to be used for) โ€“ responses for a given question. It can be either a string or an instance of a strategy class.

  • self (FeedbackDatasetBase) โ€“

  • question (Union[str, LabelQuestion, MultiLabelQuestion, RatingQuestion]) โ€“

  • strategy (Union[str, LabelQuestionStrategy, MultiLabelQuestionStrategy, RatingQuestionStrategy, RankingQuestionStrategy]) โ€“

Return type:

FeedbackDataset

delete()#

Deletes the FeedbackDataset from Argilla.

delete_metadata_properties(metadata_properties)#

Deletes the given metadata properties from the dataset.

Parameters:

metadata_properties (Union[str, List[str]]) โ€“ the name/s of the metadata property/ies to delete.

Returns:

The metadata properties that were deleted.

Raises:
  • TypeError โ€“ if metadata_properties is not a string or a list of strings.

  • ValueError โ€“ if the provided metadata_properties is/are not in the dataset.

Return type:

Union[AllowedMetadataPropertyTypes, List[AllowedMetadataPropertyTypes]]

delete_vectors_settings(vectors_settings)#

Deletes the given vector settings from the dataset.

Parameters:

vectors_settings (Union[str, List[str]]) โ€“ the name/s of the vector settings to delete.

Returns:

The vector settings that were deleted.

Raises:

ValueError โ€“ if the provided vectors_settings is/are not in the dataset.

Return type:

Union[VectorSettings, List[VectorSettings]]

field_by_name(name)#

Returns the field by name if it exists. Otherwise a ValueError is raised.

Parameters:

name (str) โ€“ the name of the field to return.

Return type:

Optional[AllowedFieldTypes]

property fields: List[AllowedFieldTypes]#

Returns the fields that define the schema of the records in the dataset.

filter_by(*args, **kwargs)#

Filters the current FeedbackDataset.

Return type:

FeedbackDataset

find_similar_records(vector_name, value=None, record=None, max_results=50)#

Finds similar records to the given record or value for the given vector_name.

Parameters:
  • vector_name (str) โ€“ a vector name to use for searching by similarity.

  • value (Optional[List[float]]) โ€“ an optional vector value to be used for searching by similarity.

  • record (Optional[R]) โ€“ an optional record to be used for searching by similarity.

  • max_results (int) โ€“ the maximum number of results for the search.

Returns:

A list of tuples with each tuple including a record and a similarity score.

Return type:

List[Tuple[FeedbackRecord, float]]

classmethod for_direct_preference_optimization(number_of_responses=2, context=False, use_markdown=False, guidelines=None, metadata_properties=None, vectors_settings=None)#

You can use this method to create a basic dataset for direct preference optimization tasks.

Parameters:
  • number_of_responses (int) โ€“ Set this parameter to the number of responses you want to add to your dataset

  • context (bool) โ€“ Set this parameter to True if you want to add context to your dataset

  • use_markdown (bool) โ€“ Set this parameter to True if you want to use markdown in your dataset

  • metadata_properties (List[Union[TermsMetadataProperty, FloatMetadataProperty, IntegerMetadataProperty]]) โ€“ Contains the metadata properties that will be indexed and could be used to filter the dataset. Defaults to None.

  • vectors_settings (List[VectorSettings]) โ€“ Define the configuration of the vectors associated to the records that will be used to perform the vector search. Defaults to None.

  • guidelines (Optional[str]) โ€“

Returns:

A FeedbackDataset object for direct preference optimization containing โ€œpromptโ€, โ€œresponse1โ€, โ€œresponse2โ€ with the optional โ€œcontextโ€ fields and a RatingQuestion named โ€œpreferenceโ€

Return type:

FeedbackDataset

classmethod for_multi_modal_classification(labels, multi_label=False, guidelines=None, metadata_properties=None, vectors_settings=None)#

You can use this method to create a basic dataset for multi-modal (video, audio,image) classification tasks.

Parameters:
  • labels (List[str]) โ€“ A list of labels for your dataset

  • multi_label (bool) โ€“ Set this parameter to True if you want to add multiple labels to your dataset

  • guidelines (Optional[str]) โ€“ Contains the guidelines for the dataset

  • metadata_properties (List[Union[TermsMetadataProperty, FloatMetadataProperty, IntegerMetadataProperty]]) โ€“ Contains the metadata properties that will be indexed and could be used to filter the dataset. Defaults to None.

  • vectors_settings (List[VectorSettings]) โ€“ Define the configuration of the vectors associated to the records that will be used to perform the vector search. Defaults to None.

Returns:

A FeedbackDataset object for multi-modal classification containing a โ€œcontentโ€ field with video, audio or image data and LabelQuestion or MultiLabelQuestion named โ€œlabelโ€

Return type:

FeedbackDataset

classmethod for_multi_modal_transcription(guidelines=None, metadata_properties=None, vectors_settings=None)#

You can use this method to create a basic dataset for multi-modal (video, audio,image) transcription tasks.

Parameters:
  • use_markdown โ€“ Set this parameter to True if you want to use markdown in your TextQuestion. Defaults to False.

  • guidelines (Optional[str]) โ€“ Contains the guidelines for the dataset

  • metadata_properties (List[Union[TermsMetadataProperty, FloatMetadataProperty, IntegerMetadataProperty]]) โ€“ Contains the metadata properties that will be indexed and could be used to filter the dataset. Defaults to None.

  • vectors_settings (List[VectorSettings]) โ€“ Define the configuration of the vectors associated to the records that will be used to perform the vector search. Defaults to None.

Returns:

A FeedbackDataset object for multi-modal transcription containing a โ€œcontentโ€ field with video, audio or image data and a TextQuestion named โ€œdescriptionโ€

Return type:

FeedbackDataset

classmethod for_natural_language_inference(labels=None, use_markdown=False, guidelines=None, metadata_properties=None, vectors_settings=None)#

You can use this method to create a basic dataset for natural language inference tasks.

Parameters:
  • labels (Optional[List[str]]) โ€“ A list of labels for your dataset

  • use_markdown (bool) โ€“ Set this parameter to True if you want to use markdown in your dataset

  • guidelines (Optional[str]) โ€“ Contains the guidelines for the dataset

  • metadata_properties (List[Union[TermsMetadataProperty, FloatMetadataProperty, IntegerMetadataProperty]]) โ€“ Contains the metadata properties that will be indexed and could be used to filter the dataset. Defaults to None.

  • vectors_settings (List[VectorSettings]) โ€“ Define the configuration of the vectors associated to the records that will be used to perform the vector search. Defaults to None.

Returns:

A FeedbackDataset object for natural language inference containing โ€œpremiseโ€ and โ€œhypothesisโ€ fields and a LabelQuestion named โ€œlabelโ€

Return type:

FeedbackDataset

classmethod for_preference_modeling(number_of_responses=2, context=False, use_markdown=False, guidelines=None, metadata_properties=None, vectors_settings=None)#

You can use this method to create a basic dataset for preference tasks.

Parameters:
  • number_of_responses (int) โ€“ Set this parameter to the number of responses you want to add to your dataset

  • context (bool) โ€“ Set this parameter to True if you want to add context to your dataset

  • use_markdown (bool) โ€“ Set this parameter to True if you want to use markdown in your dataset

  • guidelines (Optional[str]) โ€“ Contains the guidelines for the dataset.

  • metadata_properties (List[Union[TermsMetadataProperty, FloatMetadataProperty, IntegerMetadataProperty]]) โ€“ Contains the metadata properties that will be indexed and could be used to filter the dataset. Defaults to None.

  • vectors_settings (List[VectorSettings]) โ€“ Define the configuration of the vectors associated to the records that will be used to perform the vector search. Defaults to None.

Returns:

A FeedbackDataset object for preference containing โ€œpromptโ€, โ€œoption1โ€ and โ€œoption2โ€ fields and a RatingQuestion named โ€œpreferenceโ€

Return type:

FeedbackDataset

classmethod for_proximal_policy_optimization(rating_scale=7, context=False, use_markdown=False, guidelines=None, metadata_properties=None, vectors_settings=None)#

You can use this method to create a basic dataset for proximal policy optimization tasks.

Parameters:
  • rating_scale (int) โ€“ Set this parameter to the number of relevancy scale you want to add to your dataset

  • context (bool) โ€“ Set this parameter to True if you want to add context to your dataset

  • use_markdown (bool) โ€“ Set this parameter to True if you want to use markdown in your dataset

  • guidelines (Optional[str]) โ€“ Contains the guidelines for the dataset

  • metadata_properties (List[Union[TermsMetadataProperty, FloatMetadataProperty, IntegerMetadataProperty]]) โ€“ Contains the metadata properties that will be indexed and could be used to filter the dataset. Defaults to None.

  • vectors_settings (List[VectorSettings]) โ€“ Define the configuration of the vectors associated to the records that will be used to perform the vector search. Defaults to None.

Returns:

A FeedbackDataset object for proximal policy optimization containing โ€œcontextโ€ and โ€œactionโ€ fields and a LabelQuestion named โ€œlabelโ€

Return type:

FeedbackDataset

classmethod for_question_answering(use_markdown=False, guidelines=None, metadata_properties=None, vectors_settings=None)#

You can use this method to create a basic dataset for question answering tasks.

Parameters:
  • use_markdown (bool) โ€“ Set this parameter to True if you want to use markdown in your dataset

  • guidelines (Optional[str]) โ€“ Contains the guidelines for the dataset

  • metadata_properties (List[Union[TermsMetadataProperty, FloatMetadataProperty, IntegerMetadataProperty]]) โ€“ Contains the metadata properties that will be indexed and could be used to filter the dataset. Defaults to None.

  • vectors_settings (List[VectorSettings]) โ€“ Define the configuration of the vectors associated to the records that will be used to perform the vector search. Defaults to None.

Returns:

A FeedbackDataset object for question answering containing โ€œcontextโ€ and โ€œquestionโ€ fields and a TextQuestion named โ€œanswerโ€

Return type:

FeedbackDataset

classmethod for_retrieval_augmented_generation(number_of_retrievals=1, rating_scale=7, use_markdown=False, guidelines=None, metadata_properties=None, vectors_settings=None)#

You can use this method to create a basic dataset for retrieval augmented generation tasks.

Parameters:
  • number_of_retrievals (int) โ€“ Set this parameter to the number of documents you want to add to your dataset

  • rating_scale (int) โ€“ Set this parameter to the number of relevancy scale you want to add to your dataset

  • use_markdown (bool) โ€“ Set this parameter to True if you want to use markdown in your dataset

  • guidelines (Optional[str]) โ€“ Contains the guidelines for the dataset

  • metadata_properties (List[Union[TermsMetadataProperty, FloatMetadataProperty, IntegerMetadataProperty]]) โ€“ Contains the metadata properties that will be indexed and could be used to filter the dataset. Defaults to None.

  • vectors_settings (List[VectorSettings]) โ€“ Define the configuration of the vectors associated to the records that will be used to perform the vector search. Defaults to None.

Returns:

A FeedbackDataset object for retrieval augmented generation containing โ€œqueryโ€ and โ€œretrieved_documentโ€ fields and a TextQuestion named โ€œresponseโ€

Return type:

FeedbackDataset

classmethod for_sentence_similarity(rating_scale=7, use_markdown=False, guidelines=None, metadata_properties=None, vectors_settings=None)#

You can use this method to create a basic dataset for sentence similarity tasks.

Parameters:
  • rating_scale (int) โ€“ Set this parameter to the number of similarity scale you want to add to your dataset

  • use_markdown (bool) โ€“ Set this parameter to True if you want to use markdown in your dataset

  • guidelines (Optional[str]) โ€“ Contains the guidelines for the dataset

  • metadata_properties (List[Union[TermsMetadataProperty, FloatMetadataProperty, IntegerMetadataProperty]]) โ€“ Contains the metadata properties that will be indexed and could be used to filter the dataset. Defaults to None.

  • vectors_settings (List[VectorSettings]) โ€“ Define the configuration of the vectors associated to the records that will be used to perform the vector search. Defaults to None.

Returns:

A FeedbackDataset object for sentence similarity containing โ€œsentence1โ€ and โ€œsentence2โ€ fields and a RatingQuestion named โ€œsimilarityโ€

Return type:

FeedbackDataset

classmethod for_summarization(use_markdown=False, guidelines=None, metadata_properties=None, vectors_settings=None)#

You can use this method to create a basic dataset for summarization tasks.

Parameters:
  • use_markdown (bool) โ€“ Set this parameter to True if you want to use markdown in your dataset

  • guidelines (Optional[str]) โ€“ Contains the guidelines for the dataset

  • metadata_properties (List[Union[TermsMetadataProperty, FloatMetadataProperty, IntegerMetadataProperty]]) โ€“ Contains the metadata properties that will be indexed and could be used to filter the dataset. Defaults to None.

  • vectors_settings (List[VectorSettings]) โ€“ Define the configuration of the vectors associated to the records that will be used to perform the vector search. Defaults to None.

Returns:

A FeedbackDataset object for summarization containing โ€œtextโ€ field and a TextQuestion named โ€œsummaryโ€

Return type:

FeedbackDataset

classmethod for_supervised_fine_tuning(context=False, use_markdown=False, guidelines=None, metadata_properties=None, vectors_settings=None)#

You can use this method to create a basic dataset for supervised fine-tuning tasks.

Parameters:
  • context (bool) โ€“ Set this parameter to True if you want to add context to your dataset

  • use_markdown (bool) โ€“ Set this parameter to True if you want to use markdown in your dataset

  • guidelines (Optional[str]) โ€“ Contains the guidelines for the dataset

  • metadata_properties (List[Union[TermsMetadataProperty, FloatMetadataProperty, IntegerMetadataProperty]]) โ€“ Contains the metadata properties that will be indexed and could be used to filter the dataset. Defaults to None.

  • vectors_settings (List[VectorSettings]) โ€“ Define the configuration of the vectors associated to the records that will be used to perform the vector search. Defaults to None.

Returns:

A FeedbackDataset object for supervised fine-tuning containing โ€œinstructionโ€ and optional โ€œcontextโ€ field and a TextQuestion named โ€œresponseโ€

Return type:

FeedbackDataset

classmethod for_text_classification(labels, multi_label=False, use_markdown=False, guidelines=None, metadata_properties=None, vectors_settings=None)#

You can use this method to create a basic dataset for text classification tasks.

Parameters:
  • labels (List[str]) โ€“ A list of labels for your dataset

  • multi_label (bool) โ€“ Set this parameter to True if you want to add multiple labels to your dataset

  • use_markdown (bool) โ€“ Set this parameter to True if you want to use markdown in your dataset

  • guidelines (Optional[str]) โ€“ Contains the guidelines for the dataset

  • metadata_properties (List[Union[TermsMetadataProperty, FloatMetadataProperty, IntegerMetadataProperty]]) โ€“ Contains the metadata properties that will be indexed and could be used to filter the dataset. Defaults to None.

  • vectors_settings (List[VectorSettings]) โ€“ Define the configuration of the vectors associated to the records that will be used to perform the vector search. Defaults to None.

Returns:

A FeedbackDataset object for text classification containing โ€œtextโ€ field and LabelQuestion or MultiLabelQuestion named โ€œlabelโ€

Return type:

FeedbackDataset

classmethod for_translation(use_markdown=False, guidelines=None, metadata_properties=None, vectors_settings=None)#

You can use this method to create a basic dataset for translation tasks.

Parameters:
  • use_markdown (bool) โ€“ Set this parameter to True if you want to use markdown in your dataset

  • guidelines (Optional[str]) โ€“ Contains the guidelines for the dataset

  • metadata_properties (List[Union[TermsMetadataProperty, FloatMetadataProperty, IntegerMetadataProperty]]) โ€“ Contains the metadata properties that will be indexed and could be used to filter the dataset. Defaults to None.

  • vectors_settings (List[VectorSettings]) โ€“ Define the configuration of the vectors associated to the records that will be used to perform the vector search. Defaults to None.

Returns:

A FeedbackDataset object for translation containing โ€œsourceโ€ field and a TextQuestion named โ€œtargetโ€

Return type:

FeedbackDataset

format_as(format)#

Formats the FeedbackDataset as a datasets.Dataset object.

Parameters:

format (Literal['datasets']) โ€“ the format to use to format the FeedbackDataset. Currently supported formats are: datasets.

Returns:

The FeedbackDataset.records formatted as a datasets.Dataset object.

Raises:

ValueError โ€“ if the provided format is not supported.

Return type:

Dataset

Examples

>>> import argilla as rg
>>> rg.init(...)
>>> dataset = rg.FeedbackDataset.from_argilla(name="my-dataset")
>>> huggingface_dataset = dataset.format_as("datasets")
classmethod from_argilla(name=None, *, workspace=None, id=None, with_vectors=None)#

Retrieves an existing FeedbackDataset from Argilla (must have been pushed in advance).

Note that even though no argument is mandatory, you must provide either the name, the combination of name and workspace, or the id, otherwise an error will be raised.

Parameters:
  • name (Optional[str]) โ€“ the name of the FeedbackDataset to retrieve from Argilla. Defaults to None.

  • workspace (Optional[str]) โ€“ the workspace of the FeedbackDataset to retrieve from Argilla. If not provided, the active workspace will be used.

  • id (Optional[Union[str, UUID]]) โ€“ the ID of the FeedbackDataset to retrieve from Argilla. Defaults to None.

  • with_vectors (Optional[Union[Literal['all'], ~typing.List[str]]]) โ€“ the vector settings to retrieve from Argilla. Use all to download all vectors. Defaults to None.

Returns:

The RemoteFeedbackDataset retrieved from Argilla.

Raises:

ValueError โ€“ if no FeedbackDataset with the provided name, workspace, or id exists in Argilla.

Return type:

RemoteFeedbackDataset

Examples

>>> import argilla as rg
>>> rg.init(...)
>>> dataset = rg.FeedbackDataset.from_argilla(name="my_dataset")
classmethod from_huggingface(repo_id, show_progress=True, *args, **kwargs)#

Loads a FeedbackDataset from the Hugging Face Hub.

Parameters:
  • repo_id (str) โ€“ the ID of the Hugging Face Hub repo to load the FeedbackDataset from.

  • *args (Any) โ€“ the args to pass to datasets.Dataset.load_from_hub.

  • **kwargs (Any) โ€“ the kwargs to pass to datasets.Dataset.load_from_hub.

  • show_progress (bool) โ€“

  • *args โ€“

  • **kwargs โ€“

Returns:

A FeedbackDataset loaded from the Hugging Face Hub.

Return type:

FeedbackDataset

property guidelines: Optional[str]#

Returns the guidelines for annotating the dataset.

iter(batch_size=250)#

Returns an iterator over the records in the dataset.

Parameters:

batch_size (Optional[int]) โ€“ the size of the batches to return. Defaults to 100.

Return type:

Iterator[List[FeedbackRecord]]

classmethod list(workspace=None)#

Lists the `FeedbackDataset`s pushed to Argilla.

Note that you may need to rg.init(โ€ฆ) with your Argilla credentials before calling this function, otherwise, the default http://localhost:6900 will be used, which will fail if Argilla is not deployed locally.

Parameters:

workspace (Optional[str]) โ€“ the workspace where to list the datasets from. If not provided, then the workspace filtering wonโ€™t be applied. Defaults to None.

Returns:

A list of RemoteFeedbackDataset datasets, which are FeedbackDataset datasets previously pushed to Argilla via push_to_argilla.

Return type:

List[RemoteFeedbackDataset]

property metadata_properties: List[AllowedMetadataPropertyTypes]#

Returns the metadata properties that will be indexed and could be used to filter the dataset.

metadata_property_by_name(name)#

Returns the metadata property by name if it exists.

Parameters:

name (str) โ€“ the name of the metadata property to return.

Return type:

Optional[AllowedMetadataPropertyTypes]

prepare_for_training(framework, task, train_size=1, test_size=None, seed=None, lang=None)#

Prepares the dataset for training for a specific training framework and NLP task by splitting the dataset into train and test sets.

Parameters:
  • framework (Union[Framework, str]) โ€“ the framework to use for training. Currently supported frameworks are: transformers, peft, setfit, spacy, spacy-transformers, span_marker, spark-nlp, openai, trl, sentence-transformers.

  • task (Union[TrainingTaskForTextClassification, TrainingTaskForSFT, TrainingTaskForRM, TrainingTaskForPPO, TrainingTaskForDPO, TrainingTaskForChatCompletion, TrainingTaskForSentenceSimilarity]) โ€“ the NLP task to use for training. Currently supported tasks are: TrainingTaskForTextClassification, TrainingTaskForSFT, TrainingTaskForRM, TrainingTaskForPPO, TrainingTaskForDPO, TrainingTaskForSentenceSimilarity.

  • train_size (Optional[float]) โ€“ the size of the train set. If None, the whole dataset will be used for training.

  • test_size (Optional[float]) โ€“ the size of the test set. If None, the whole dataset will be used for testing.

  • seed (Optional[int]) โ€“ the seed to use for splitting the dataset into train and test sets.

  • lang (Optional[str]) โ€“ the spaCy language to use for training. If None, the language of the dataset will be used.

Return type:

Any

pull(*args, **kwargs)#

Pulls the dataset from Argilla and returns a local instance of it.

Return type:

FeedbackDataset

push_to_argilla(name, workspace=None, show_progress=True)#

Pushes the FeedbackDataset to Argilla.

Note that you may need to rg.init(โ€ฆ) with your Argilla credentials before calling this function, otherwise the default http://localhost:6900 will be used, which will fail if Argilla is not deployed locally.

Parameters:
  • name (str) โ€“ the name of the dataset to push to Argilla.

  • workspace (Optional[Union[str, Workspace]]) โ€“ the workspace where to push the dataset to. If not provided, the active workspace will be used.

  • show_progress (bool) โ€“ the option to choose to show/hide tqdm progress bar while looping over records.

  • self (Union[FeedbackDataset, ArgillaMixin]) โ€“

Returns:

The FeedbackDataset pushed to Argilla, which is now an instance of RemoteFeedbackDataset.

Return type:

RemoteFeedbackDataset

push_to_huggingface(repo_id, generate_card=True, *args, **kwargs)#

Pushes the FeedbackDataset to the Hugging Face Hub. If the dataset has been previously pushed to the Hugging Face Hub, it will be updated instead. Note that some params as private have no effect at all when a dataset is previously uploaded to the Hugging Face Hub.

Parameters:
  • dataset โ€“ the FeedbackDataset to push to the Hugging Face Hub.

  • repo_id (str) โ€“ the ID of the Hugging Face Hub repo to push the FeedbackDataset to.

  • generate_card (Optional[bool]) โ€“ whether to generate a dataset card for the FeedbackDataset in the Hugging Face Hub. Defaults to True.

  • *args โ€“ the args to pass to datasets.Dataset.push_to_hub.

  • **kwargs โ€“ the kwargs to pass to datasets.Dataset.push_to_hub.

  • self (FeedbackDataset) โ€“

Return type:

None

question_by_name(name)#

Returns the question by name if it exists.

Parameters:

name (str) โ€“ the name of the question to return.

Return type:

Optional[AllowedQuestionTypes]

property questions: List[AllowedQuestionTypes]#

Returns the questions that will be used to annotate the dataset.

property records: List[FeedbackRecord]#

Returns the records in the dataset.

sort_by(field, order=SortOrder.asc)#

Sorts the records in the dataset by the given field.

Parameters:
  • field (Union[str, RecordSortField]) โ€“

  • order (Union[str, SortOrder]) โ€“

Return type:

FeedbackDataset

update_metadata_properties(metadata_properties)#

Does nothing because the metadata_properties are updated automatically for FeedbackDataset datasets when assigning their updateable attributes to a new value.

Parameters:

metadata_properties (Union[AllowedMetadataPropertyTypes, List[AllowedMetadataPropertyTypes]]) โ€“

Return type:

None

update_records(records)#

Updates the records of the dataset.

Parameters:

records (Union[FeedbackRecord, List[FeedbackRecord]]) โ€“ the records to update the dataset with.

Raises:

ValueError โ€“ if the provided records are invalid.

Return type:

None

update_vectors_settings(vectors_settings)#

Does nothing because the vector_settings are updated automatically for FeedbackDataset datasets when assigning their updateable attributes to a new value.

Parameters:

vectors_settings (Union[VectorSettings, List[VectorSettings]]) โ€“

Return type:

None

vector_settings_by_name(name)#

Returns the vector settings by name if it exists.

Parameters:

name (str) โ€“ the name of the vector settings to return.

Raises:

KeyError โ€“ if the vector settings with the given name does not exist.

Return type:

Optional[AllowedVectorSettingsTypes]

property vectors_settings: List[VectorSettings]#

Returns the vector settings of the dataset.

class argilla.client.feedback.dataset.local.mixins.TaskTemplateMixin#

Mixin to add task template functionality to a FeedbackDataset. The NLP tasks covered are:

โ€œtext_classificationโ€ โ€œextractive_question_answeringโ€ โ€œsummarizationโ€ โ€œtranslationโ€ โ€œsentence_similarityโ€ โ€œnatural_language_inferenceโ€ โ€œsupervised_fine_tuningโ€ โ€œpreference_modeling/reward_modelingโ€ โ€œproximal_policy_optimizationโ€ โ€œdirect_preference_optimizationโ€ โ€œretrieval_augmented_generationโ€ โ€œmulti_modal_classificationโ€ โ€œmulti_modal_transcriptionโ€

classmethod for_direct_preference_optimization(number_of_responses=2, context=False, use_markdown=False, guidelines=None, metadata_properties=None, vectors_settings=None)#

You can use this method to create a basic dataset for direct preference optimization tasks.

Parameters:
  • number_of_responses (int) โ€“ Set this parameter to the number of responses you want to add to your dataset

  • context (bool) โ€“ Set this parameter to True if you want to add context to your dataset

  • use_markdown (bool) โ€“ Set this parameter to True if you want to use markdown in your dataset

  • metadata_properties (List[Union[TermsMetadataProperty, FloatMetadataProperty, IntegerMetadataProperty]]) โ€“ Contains the metadata properties that will be indexed and could be used to filter the dataset. Defaults to None.

  • vectors_settings (List[VectorSettings]) โ€“ Define the configuration of the vectors associated to the records that will be used to perform the vector search. Defaults to None.

  • guidelines (Optional[str]) โ€“

Returns:

A FeedbackDataset object for direct preference optimization containing โ€œpromptโ€, โ€œresponse1โ€, โ€œresponse2โ€ with the optional โ€œcontextโ€ fields and a RatingQuestion named โ€œpreferenceโ€

Return type:

FeedbackDataset

classmethod for_multi_modal_classification(labels, multi_label=False, guidelines=None, metadata_properties=None, vectors_settings=None)#

You can use this method to create a basic dataset for multi-modal (video, audio,image) classification tasks.

Parameters:
  • labels (List[str]) โ€“ A list of labels for your dataset

  • multi_label (bool) โ€“ Set this parameter to True if you want to add multiple labels to your dataset

  • guidelines (Optional[str]) โ€“ Contains the guidelines for the dataset

  • metadata_properties (List[Union[TermsMetadataProperty, FloatMetadataProperty, IntegerMetadataProperty]]) โ€“ Contains the metadata properties that will be indexed and could be used to filter the dataset. Defaults to None.

  • vectors_settings (List[VectorSettings]) โ€“ Define the configuration of the vectors associated to the records that will be used to perform the vector search. Defaults to None.

Returns:

A FeedbackDataset object for multi-modal classification containing a โ€œcontentโ€ field with video, audio or image data and LabelQuestion or MultiLabelQuestion named โ€œlabelโ€

Return type:

FeedbackDataset

classmethod for_multi_modal_transcription(guidelines=None, metadata_properties=None, vectors_settings=None)#

You can use this method to create a basic dataset for multi-modal (video, audio,image) transcription tasks.

Parameters:
  • use_markdown โ€“ Set this parameter to True if you want to use markdown in your TextQuestion. Defaults to False.

  • guidelines (Optional[str]) โ€“ Contains the guidelines for the dataset

  • metadata_properties (List[Union[TermsMetadataProperty, FloatMetadataProperty, IntegerMetadataProperty]]) โ€“ Contains the metadata properties that will be indexed and could be used to filter the dataset. Defaults to None.

  • vectors_settings (List[VectorSettings]) โ€“ Define the configuration of the vectors associated to the records that will be used to perform the vector search. Defaults to None.

Returns:

A FeedbackDataset object for multi-modal transcription containing a โ€œcontentโ€ field with video, audio or image data and a TextQuestion named โ€œdescriptionโ€

Return type:

FeedbackDataset

classmethod for_natural_language_inference(labels=None, use_markdown=False, guidelines=None, metadata_properties=None, vectors_settings=None)#

You can use this method to create a basic dataset for natural language inference tasks.

Parameters:
  • labels (Optional[List[str]]) โ€“ A list of labels for your dataset

  • use_markdown (bool) โ€“ Set this parameter to True if you want to use markdown in your dataset

  • guidelines (Optional[str]) โ€“ Contains the guidelines for the dataset

  • metadata_properties (List[Union[TermsMetadataProperty, FloatMetadataProperty, IntegerMetadataProperty]]) โ€“ Contains the metadata properties that will be indexed and could be used to filter the dataset. Defaults to None.

  • vectors_settings (List[VectorSettings]) โ€“ Define the configuration of the vectors associated to the records that will be used to perform the vector search. Defaults to None.

Returns:

A FeedbackDataset object for natural language inference containing โ€œpremiseโ€ and โ€œhypothesisโ€ fields and a LabelQuestion named โ€œlabelโ€

Return type:

FeedbackDataset

classmethod for_preference_modeling(number_of_responses=2, context=False, use_markdown=False, guidelines=None, metadata_properties=None, vectors_settings=None)#

You can use this method to create a basic dataset for preference tasks.

Parameters:
  • number_of_responses (int) โ€“ Set this parameter to the number of responses you want to add to your dataset

  • context (bool) โ€“ Set this parameter to True if you want to add context to your dataset

  • use_markdown (bool) โ€“ Set this parameter to True if you want to use markdown in your dataset

  • guidelines (Optional[str]) โ€“ Contains the guidelines for the dataset.

  • metadata_properties (List[Union[TermsMetadataProperty, FloatMetadataProperty, IntegerMetadataProperty]]) โ€“ Contains the metadata properties that will be indexed and could be used to filter the dataset. Defaults to None.

  • vectors_settings (List[VectorSettings]) โ€“ Define the configuration of the vectors associated to the records that will be used to perform the vector search. Defaults to None.

Returns:

A FeedbackDataset object for preference containing โ€œpromptโ€, โ€œoption1โ€ and โ€œoption2โ€ fields and a RatingQuestion named โ€œpreferenceโ€

Return type:

FeedbackDataset

classmethod for_proximal_policy_optimization(rating_scale=7, context=False, use_markdown=False, guidelines=None, metadata_properties=None, vectors_settings=None)#

You can use this method to create a basic dataset for proximal policy optimization tasks.

Parameters:
  • rating_scale (int) โ€“ Set this parameter to the number of relevancy scale you want to add to your dataset

  • context (bool) โ€“ Set this parameter to True if you want to add context to your dataset

  • use_markdown (bool) โ€“ Set this parameter to True if you want to use markdown in your dataset

  • guidelines (Optional[str]) โ€“ Contains the guidelines for the dataset

  • metadata_properties (List[Union[TermsMetadataProperty, FloatMetadataProperty, IntegerMetadataProperty]]) โ€“ Contains the metadata properties that will be indexed and could be used to filter the dataset. Defaults to None.

  • vectors_settings (List[VectorSettings]) โ€“ Define the configuration of the vectors associated to the records that will be used to perform the vector search. Defaults to None.

Returns:

A FeedbackDataset object for proximal policy optimization containing โ€œcontextโ€ and โ€œactionโ€ fields and a LabelQuestion named โ€œlabelโ€

Return type:

FeedbackDataset

classmethod for_question_answering(use_markdown=False, guidelines=None, metadata_properties=None, vectors_settings=None)#

You can use this method to create a basic dataset for question answering tasks.

Parameters:
  • use_markdown (bool) โ€“ Set this parameter to True if you want to use markdown in your dataset

  • guidelines (Optional[str]) โ€“ Contains the guidelines for the dataset

  • metadata_properties (List[Union[TermsMetadataProperty, FloatMetadataProperty, IntegerMetadataProperty]]) โ€“ Contains the metadata properties that will be indexed and could be used to filter the dataset. Defaults to None.

  • vectors_settings (List[VectorSettings]) โ€“ Define the configuration of the vectors associated to the records that will be used to perform the vector search. Defaults to None.

Returns:

A FeedbackDataset object for question answering containing โ€œcontextโ€ and โ€œquestionโ€ fields and a TextQuestion named โ€œanswerโ€

Return type:

FeedbackDataset

classmethod for_retrieval_augmented_generation(number_of_retrievals=1, rating_scale=7, use_markdown=False, guidelines=None, metadata_properties=None, vectors_settings=None)#

You can use this method to create a basic dataset for retrieval augmented generation tasks.

Parameters:
  • number_of_retrievals (int) โ€“ Set this parameter to the number of documents you want to add to your dataset

  • rating_scale (int) โ€“ Set this parameter to the number of relevancy scale you want to add to your dataset

  • use_markdown (bool) โ€“ Set this parameter to True if you want to use markdown in your dataset

  • guidelines (Optional[str]) โ€“ Contains the guidelines for the dataset

  • metadata_properties (List[Union[TermsMetadataProperty, FloatMetadataProperty, IntegerMetadataProperty]]) โ€“ Contains the metadata properties that will be indexed and could be used to filter the dataset. Defaults to None.

  • vectors_settings (List[VectorSettings]) โ€“ Define the configuration of the vectors associated to the records that will be used to perform the vector search. Defaults to None.

Returns:

A FeedbackDataset object for retrieval augmented generation containing โ€œqueryโ€ and โ€œretrieved_documentโ€ fields and a TextQuestion named โ€œresponseโ€

Return type:

FeedbackDataset

classmethod for_sentence_similarity(rating_scale=7, use_markdown=False, guidelines=None, metadata_properties=None, vectors_settings=None)#

You can use this method to create a basic dataset for sentence similarity tasks.

Parameters:
  • rating_scale (int) โ€“ Set this parameter to the number of similarity scale you want to add to your dataset

  • use_markdown (bool) โ€“ Set this parameter to True if you want to use markdown in your dataset

  • guidelines (Optional[str]) โ€“ Contains the guidelines for the dataset

  • metadata_properties (List[Union[TermsMetadataProperty, FloatMetadataProperty, IntegerMetadataProperty]]) โ€“ Contains the metadata properties that will be indexed and could be used to filter the dataset. Defaults to None.

  • vectors_settings (List[VectorSettings]) โ€“ Define the configuration of the vectors associated to the records that will be used to perform the vector search. Defaults to None.

Returns:

A FeedbackDataset object for sentence similarity containing โ€œsentence1โ€ and โ€œsentence2โ€ fields and a RatingQuestion named โ€œsimilarityโ€

Return type:

FeedbackDataset

classmethod for_summarization(use_markdown=False, guidelines=None, metadata_properties=None, vectors_settings=None)#

You can use this method to create a basic dataset for summarization tasks.

Parameters:
  • use_markdown (bool) โ€“ Set this parameter to True if you want to use markdown in your dataset

  • guidelines (Optional[str]) โ€“ Contains the guidelines for the dataset

  • metadata_properties (List[Union[TermsMetadataProperty, FloatMetadataProperty, IntegerMetadataProperty]]) โ€“ Contains the metadata properties that will be indexed and could be used to filter the dataset. Defaults to None.

  • vectors_settings (List[VectorSettings]) โ€“ Define the configuration of the vectors associated to the records that will be used to perform the vector search. Defaults to None.

Returns:

A FeedbackDataset object for summarization containing โ€œtextโ€ field and a TextQuestion named โ€œsummaryโ€

Return type:

FeedbackDataset

classmethod for_supervised_fine_tuning(context=False, use_markdown=False, guidelines=None, metadata_properties=None, vectors_settings=None)#

You can use this method to create a basic dataset for supervised fine-tuning tasks.

Parameters:
  • context (bool) โ€“ Set this parameter to True if you want to add context to your dataset

  • use_markdown (bool) โ€“ Set this parameter to True if you want to use markdown in your dataset

  • guidelines (Optional[str]) โ€“ Contains the guidelines for the dataset

  • metadata_properties (List[Union[TermsMetadataProperty, FloatMetadataProperty, IntegerMetadataProperty]]) โ€“ Contains the metadata properties that will be indexed and could be used to filter the dataset. Defaults to None.

  • vectors_settings (List[VectorSettings]) โ€“ Define the configuration of the vectors associated to the records that will be used to perform the vector search. Defaults to None.

Returns:

A FeedbackDataset object for supervised fine-tuning containing โ€œinstructionโ€ and optional โ€œcontextโ€ field and a TextQuestion named โ€œresponseโ€

Return type:

FeedbackDataset

classmethod for_text_classification(labels, multi_label=False, use_markdown=False, guidelines=None, metadata_properties=None, vectors_settings=None)#

You can use this method to create a basic dataset for text classification tasks.

Parameters:
  • labels (List[str]) โ€“ A list of labels for your dataset

  • multi_label (bool) โ€“ Set this parameter to True if you want to add multiple labels to your dataset

  • use_markdown (bool) โ€“ Set this parameter to True if you want to use markdown in your dataset

  • guidelines (Optional[str]) โ€“ Contains the guidelines for the dataset

  • metadata_properties (List[Union[TermsMetadataProperty, FloatMetadataProperty, IntegerMetadataProperty]]) โ€“ Contains the metadata properties that will be indexed and could be used to filter the dataset. Defaults to None.

  • vectors_settings (List[VectorSettings]) โ€“ Define the configuration of the vectors associated to the records that will be used to perform the vector search. Defaults to None.

Returns:

A FeedbackDataset object for text classification containing โ€œtextโ€ field and LabelQuestion or MultiLabelQuestion named โ€œlabelโ€

Return type:

FeedbackDataset

classmethod for_translation(use_markdown=False, guidelines=None, metadata_properties=None, vectors_settings=None)#

You can use this method to create a basic dataset for translation tasks.

Parameters:
  • use_markdown (bool) โ€“ Set this parameter to True if you want to use markdown in your dataset

  • guidelines (Optional[str]) โ€“ Contains the guidelines for the dataset

  • metadata_properties (List[Union[TermsMetadataProperty, FloatMetadataProperty, IntegerMetadataProperty]]) โ€“ Contains the metadata properties that will be indexed and could be used to filter the dataset. Defaults to None.

  • vectors_settings (List[VectorSettings]) โ€“ Define the configuration of the vectors associated to the records that will be used to perform the vector search. Defaults to None.

Returns:

A FeedbackDataset object for translation containing โ€œsourceโ€ field and a TextQuestion named โ€œtargetโ€

Return type:

FeedbackDataset

class argilla.client.feedback.dataset.remote.dataset.RemoteFeedbackDataset(*, client, id, name, workspace, created_at, updated_at, fields, questions, guidelines=None, allow_extra_metadata=True, with_vectors=None)#
Parameters:
  • client (httpx.Client) โ€“

  • id (UUID) โ€“

  • name (str) โ€“

  • workspace (Workspace) โ€“

  • created_at (datetime) โ€“

  • updated_at (datetime) โ€“

  • fields (List[AllowedRemoteFieldTypes]) โ€“

  • questions (List[AllowedRemoteQuestionTypes]) โ€“

  • guidelines (Optional[str]) โ€“

  • allow_extra_metadata (bool) โ€“

  • with_vectors (Optional[Union[Literal['all'], ~typing.List[str]]]) โ€“

add_metadata_property(metadata_property)#

Adds a new metadata_property to the current FeedbackDataset in Argilla.

Note

Existing FeedbackRecord`s if any will remain unchanged if those contain metadata named the same way as the `metadata_property, but added before the metadata_property was added.

Parameters:

metadata_property (AllowedMetadataPropertyTypes) โ€“ the metadata property to add to the current FeedbackDataset in Argilla.

Returns:

The newly added metadata_property to the current FeedbackDataset in Argilla.

Raises:
  • PermissionError โ€“ if the user does not have either owner or admin role.

  • RuntimeError โ€“ if the metadata_property cannot be added to the current FeedbackDataset in Argilla.

Return type:

AllowedRemoteMetadataPropertyTypes

add_records(records, show_progress=True)#

Adds the given records to the dataset and pushes those to Argilla.

Parameters:
  • records (Union[FeedbackRecord, Dict[str, Any], List[Union[FeedbackRecord, Dict[str, Any]]]]) โ€“ can be a single FeedbackRecord, a list of FeedbackRecord, a single dictionary, or a list of dictionaries. If a dictionary is provided, it will be converted to a FeedbackRecord internally.

  • show_progress (bool) โ€“ if True, shows a progress bar while pushing the records to Argilla. Defaults to True.

Raises:
  • PermissionError โ€“ if the user does not have either owner or admin role.

  • ValueError โ€“ if the given records are neither: FeedbackRecord, list of FeedbackRecord, list of dictionaries as a record or dictionary as a record; or if the given records do not match the expected schema.

Return type:

None

add_vector_settings(vector_settings)#

Adds a new vector settings to the current FeedbackDataset in Argilla.

Parameters:

vector_settings (VectorSettings) โ€“ the vector settings to add.

Returns:

The newly added vector settings to the current FeedbackDataset in Argilla.

Raises:
  • PermissionError โ€“ if the user does not have either owner or admin role.

  • ValueError โ€“ if the vector settings with the given name already exists in the dataset in Argilla.

Return type:

RemoteVectorSettings

property allow_extra_metadata: bool#

Returns whether if adding extra metadata to the records of the dataset is allowed

compute_agreement_metrics(metric_names=None, question_name=None)#

Compute agreement or reliability of annotation metrics.

This metrics can be used to determine the level of agreement across our annotation team, or whether the guidelines are clear enough for example.

Parameters:
Return type:

Union[AgreementMetricResult, List[AgreementMetricResult]]

Note

Currently, TextQuestion is not supported.

Returns:

Agreement metrics result or a list of metrics results if a list of metric

names is provided.

Return type:

metrics_result

Parameters:
compute_model_metrics(metric_names=None, question_name=None, strategy=None)#

Compute metrics for the annotators using the suggestions as the ground truth, and the responses as the predicted value, or if a strategy is provided, the same but applied to unified responses.

The metric interpretation is the same whether the responses are unified or not.

Parameters:
  • metric_names (Union[str, List[str]]) โ€“ Metric name or list of metric names of the metrics, dependent on the question type.

  • question_name (Union[str, LabelQuestion, MultiLabelQuestion, RatingQuestion, TextQuestion, RankingQuestion]) โ€“ Question for which we want to compute the metrics.

  • strategy (Optional[Union[str, LabelQuestionStrategy, MultiLabelQuestion, RatingQuestionStrategy, RankingQuestion]]) โ€“ Unification strategy. If given, will unify the responses of the dataset and compute the metrics on the unified responses vs the suggestions instead on a per user level. See unified_responses method for more information. Defaults to None.

Return type:

Union[Dict[str, List[ModelMetricResult]], ModelMetricResult, List[ModelMetricResult]]

Note

Currently, the following types of questions are supported: - For annotator level questions: all the types of questions - For unified responses: all the questions except the TextQuestion.

Returns:

If strategy is provided it will unify the annotations and return

the metrics for the unified responses. Otherwise, it will return the metrics for each annotator as a dict, where the key corresponds to the annotator id and the values are a list with the metrics.

Return type:

metrics_container

Parameters:
compute_unified_responses(question, strategy)#

The compute_unified_responses function takes a question and a strategy as input and applies the strategy to unify the responses for that question.

Parameters:
  • the (question The question parameter can be either a string representing the name of) โ€“ question, or an instance of one of the question classes (LabelQuestion, MultiLabelQuestion, RatingQuestion, RankingQuestion).

  • unifying (strategy The strategy parameter is used to specify the strategy to be used for) โ€“ responses for a given question. It can be either a string or an instance of a strategy class.

  • self (FeedbackDatasetBase) โ€“

  • question (Union[str, LabelQuestion, MultiLabelQuestion, RatingQuestion]) โ€“

  • strategy (Union[str, LabelQuestionStrategy, MultiLabelQuestionStrategy, RatingQuestionStrategy, RankingQuestionStrategy]) โ€“

Return type:

FeedbackDataset

property created_at: datetime#

Returns the datetime when the dataset was created in Argilla.

delete()#

Deletes the current FeedbackDataset from Argilla. This method is just working if the user has either owner or admin role.

Raises:
  • PermissionError โ€“ if the user does not have either owner or admin role.

  • RuntimeError โ€“ if the FeedbackDataset cannot be deleted from Argilla.

Return type:

None

delete_metadata_properties(metadata_properties)#

Deletes a list of metadata_properties from the current FeedbackDataset in Argilla.

Note

Existing FeedbackRecord`s if any, will remain unchanged if those contain metadata named the same way as the `metadata_properties to delete, but the validation will be removed as well as metadata_property index, which means one wonโ€™t be able to use that for filtering.

Parameters:

metadata_properties (Union[str, List[str]]) โ€“ the metadata property/ies name/s to delete from the current FeedbackDataset in Argilla.

Returns:

The metadata_property or metadata_properties deleted from the current FeedbackDataset in Argilla, but using the local schema e.g. if you delete a RemoteFloatMetadataProperty this method will delete it from Argilla and will return a FloatMetadataProperty instance.

Raises:
  • PermissionError โ€“ if the user does not have either owner or admin role.

  • RuntimeError โ€“ if the metadata_properties cannot be deleted from the current FeedbackDataset in Argilla.

Return type:

Union[AllowedMetadataPropertyTypes, List[AllowedMetadataPropertyTypes]]

delete_records(records)#

Deletes the given records from the dataset in Argilla.

Parameters:

records (Union[RemoteFeedbackRecord, List[RemoteFeedbackRecord]]) โ€“ the records to delete from the dataset. Can be a single record or a list of records. But those need to be previously pushed to Argilla, otherwise they wonโ€™t be deleted.

Raises:
  • PermissionError โ€“ if the user does not have either owner or admin role.

  • RuntimeError โ€“ If the deletion of the records from Argilla fails.

Return type:

None

delete_vectors_settings(vectors_settings)#

Deletes the given vectors settings from the current FeedbackDataset in Argilla.

Parameters:

vectors_settings (Union[str, List[str]]) โ€“ the name/s of the vectors settings to delete.

Returns:

The vectors settings deleted from the current FeedbackDataset in Argilla.

Raises:

ValueError โ€“ if the given vectors settings do not exist in the current FeedbackDataset in Argilla.

Return type:

Union[RemoteVectorSettings, List[RemoteVectorSettings]]

field_by_name(name)#

Returns the field by name if it exists. Otherwise a ValueError is raised.

Parameters:

name (str) โ€“ the name of the field to return.

Return type:

Optional[AllowedFieldTypes]

property fields: List[AllowedRemoteFieldTypes]#

Returns the fields that define the schema of the records in the dataset.

filter_by(*, response_status=None, metadata_filters=None)#

Filters the current RemoteFeedbackDataset based on the response_status of the responses of the records in Argilla. This method creates a new class instance of FilteredRemoteFeedbackDataset with the given filters.

Parameters:
  • response_status (Optional[Union[ResponseStatusFilter, List[ResponseStatusFilter]]]) โ€“ the response status/es to filter the dataset by. Can be one of: draft, pending, submitted, and discarded. Defaults to None.

  • metadata_filters (Optional[Union[MetadataFilters, List[MetadataFilters]]]) โ€“ the metadata filters to filter the dataset by. Can be one of: TermsMetadataFilter, IntegerMetadataFilter, and FloatMetadataFilter. Defaults to None.

Returns:

A new instance of FilteredRemoteFeedbackDataset with the given filters.

Return type:

RemoteFeedbackDataset

find_similar_records(vector_name, value=None, record=None, max_results=50)#

Finds similar records to the given record in the dataset based on the given vector.

Parameters:
  • vector_name (str) โ€“ a vector name to use for searching by similarity.

  • value (Optional[List[float]]) โ€“ an optional vector value to be used for searching by similarity. Defaults to None.

  • record (Optional[RemoteFeedbackRecord]) โ€“ an optional record to be used for searching by similarity. Defaults to None.

  • max_results (int) โ€“ the maximum number of results for the search. Defaults to 50.

Returns:

A list of tuples with each tuple including a record and a similarity score.

Return type:

List[Tuple[RemoteFeedbackRecord, float]]

format_as(format)#

Formats the FeedbackDataset as a datasets.Dataset object.

Parameters:

format (Literal['datasets']) โ€“ the format to use to format the FeedbackDataset. Currently supported formats are: datasets.

Returns:

The FeedbackDataset.records formatted as a datasets.Dataset object.

Raises:

ValueError โ€“ if the provided format is not supported.

Return type:

Dataset

Examples

>>> import argilla as rg
>>> rg.init(...)
>>> dataset = rg.FeedbackDataset.from_argilla(name="my-dataset")
>>> huggingface_dataset = dataset.format_as("datasets")
property guidelines: Optional[str]#

Returns the guidelines for annotating the dataset.

property id: UUID#

Returns the ID of the dataset in Argilla.

property metadata_properties: List[AllowedRemoteMetadataPropertyTypes]#

Retrieves the metadata_properties of the current dataset from Argilla, and returns them if any, otherwise, it returns an empty list.

metadata_property_by_name(name)#

Returns the metadata property by name if it exists.

Parameters:

name (str) โ€“ the name of the metadata property to return.

Return type:

Optional[AllowedMetadataPropertyTypes]

property name: str#

Returns the name of the dataset in Argilla.

prepare_for_training(framework, task, train_size=1, test_size=None, seed=None, lang=None)#

Prepares the dataset for training for a specific training framework and NLP task by splitting the dataset into train and test sets.

Parameters:
  • framework (Union[Framework, str]) โ€“ the framework to use for training. Currently supported frameworks are: transformers, peft, setfit, spacy, spacy-transformers, span_marker, spark-nlp, openai, trl, sentence-transformers.

  • task (Union[TrainingTaskForTextClassification, TrainingTaskForSFT, TrainingTaskForRM, TrainingTaskForPPO, TrainingTaskForDPO, TrainingTaskForChatCompletion, TrainingTaskForSentenceSimilarity]) โ€“ the NLP task to use for training. Currently supported tasks are: TrainingTaskForTextClassification, TrainingTaskForSFT, TrainingTaskForRM, TrainingTaskForPPO, TrainingTaskForDPO, TrainingTaskForSentenceSimilarity.

  • train_size (Optional[float]) โ€“ the size of the train set. If None, the whole dataset will be used for training.

  • test_size (Optional[float]) โ€“ the size of the test set. If None, the whole dataset will be used for testing.

  • seed (Optional[int]) โ€“ the seed to use for splitting the dataset into train and test sets.

  • lang (Optional[str]) โ€“ the spaCy language to use for training. If None, the language of the dataset will be used.

Return type:

Any

pull(max_records=None)#

Pulls the dataset from Argilla and returns a local instance of it.

Parameters:

max_records (Optional[int]) โ€“ the maximum number of records to pull from Argilla. Defaults to None.

Returns:

A local instance of the dataset which is a FeedbackDataset object.

Return type:

FeedbackDataset

push_to_argilla(name, workspace=None, show_progress=False)#

Pushes the FeedbackDataset to Argilla.

Parameters:
  • name (str) โ€“

  • workspace (Optional[Union[str, Workspace]]) โ€“

  • show_progress (bool) โ€“

Return type:

RemoteFeedbackDataset

push_to_huggingface(repo_id, generate_card=True, *args, **kwargs)#

Pushes the current FeedbackDataset to HuggingFace Hub.

Note

The records from the RemoteFeedbackDataset are being pulled before pushing, to ensure that thereโ€™s no missmatch while uploading those as those are lazily fetched.

Parameters:
  • repo_id (str) โ€“ the ID of the HuggingFace repo to push the dataset to.

  • generate_card (Optional[bool]) โ€“ whether to generate a dataset card or not. Defaults to True.

Return type:

None

question_by_name(name)#

Returns the question by name if it exists.

Parameters:

name (str) โ€“ the name of the question to return.

Return type:

Optional[AllowedQuestionTypes]

property questions: List[AllowedRemoteQuestionTypes]#

Returns the questions that will be used to annotate the dataset.

property records: RemoteFeedbackRecords#

Returns an instance of RemoteFeedbackRecords that allows you to iterate over the records in the dataset. The records are fetched from Argilla on the fly and not stored in memory. You can also iterate over the records directly from the dataset instance.

sort_by(sort)#

Sorts the current RemoteFeedbackDataset based on the given sort fields and orders.

Parameters:

sort (List[SortBy]) โ€“

Return type:

RemoteFeedbackDataset

update_metadata_properties(metadata_properties)#

Updates a list of metadata_properties in the current FeedbackDataset in Argilla.

Note

All the metadata_properties provided must exist in Argilla in advance, and those will be pushed again to Argilla with the current values that they have, which ideally, should have been updated via assignment e.g. metadata_property.title = โ€œโ€ฆโ€.

Parameters:

metadata_properties (Union[AllowedRemoteMetadataPropertyTypes, List[AllowedRemoteMetadataPropertyTypes]]) โ€“ the metadata property/ies to update in the current FeedbackDataset in Argilla.

Raises:
  • PermissionError โ€“ if the user does not have either owner or admin role.

  • RuntimeError โ€“ if the metadata_properties cannot be updated in the current FeedbackDataset in Argilla.

Return type:

None

update_records(records, show_progress=True)#

Updates the given records in the dataset in Argilla.

Parameters:
  • records (Union[RemoteFeedbackRecord, List[RemoteFeedbackRecord]]) โ€“ the records to update in the dataset. Can be a single record or a list of records. The records need to be previously pushed to Argilla, otherwise they wonโ€™t be updated.

  • show_progress (bool) โ€“ if True, shows a progress bar while pushing the records to Argilla. Defaults to True.

Raises:

PermissionError โ€“ if the user does not have either owner or admin role.

Return type:

None

update_vectors_settings(vectors_settings)#

Updates the given vector settings in the current FeedbackDataset in Argilla.

Parameters:

vectors_settings (Union[RemoteVectorSettings, List[RemoteVectorSettings]]) โ€“ the remote vectors settings to update. Must exist in Argilla in advance.

Raises:
  • PermissionError โ€“ if the user does not have either owner or admin role.

  • RuntimeError โ€“ if the vector settings cannot be updated in the current FeedbackDataset in Argilla.

Return type:

None

property updated_at: datetime#

Returns the datetime when the dataset was last updated in Argilla.

property url: str#

Returns the URL of the dataset in Argilla.

vector_settings_by_name(name)#

Returns the vector settings by name if it exists.

Parameters:

name (str) โ€“ the name of the vector settings to return.

Raises:

KeyError โ€“ if the vector settings with the given name does not exist.

Return type:

Optional[AllowedVectorSettingsTypes]

property vectors_settings: List[RemoteVectorSettings]#

Retrieves the vectors_settings of the current dataset from Argilla

property workspace: Workspace#

Returns the workspace the dataset belongs to in Argilla.

class argilla.client.feedback.schemas.questions.LabelQuestion(*, name, title=None, description=None, required=True, type='label_selection', labels, visible_labels='undefined')#

Schema for the FeedbackDataset label questions, which are the ones that will require a label response from the user. This class should be used when the user can only select one label.

Parameters:
  • type (Literal[<QuestionTypes.label_selection: 'label_selection'>]) โ€“ The type of the question. Defaults to โ€˜label_selectionโ€™ and cannot/shouldnโ€™t be modified.

  • labels (Union[ConstrainedListValue[str], Dict[str, str]]) โ€“ The list of labels of the label question. The labels must be unique, and the list must contain at least two unique labels. Additionally, labels can also be a dictionary of labels, where the keys are the labels, and the values are the labels that will be shown in the UI.

  • visible_labels (Optional[Union[Literal['undefined'], ~argilla.client.feedback.schemas.questions.ConstrainedIntValue]]) โ€“ The number of visible labels in the UI. Defaults to 20, and must be 3 or greater.

  • name (str) โ€“

  • title (Optional[str]) โ€“

  • description (Optional[str]) โ€“

  • required (bool) โ€“

Examples

>>> from argilla.client.feedback.schemas.questions import LabelQuestion
>>> LabelQuestion(name="label_question", title="Label Question", labels=["label_1", "label_2"])
class argilla.client.feedback.schemas.questions.MultiLabelQuestion(*, name, title=None, description=None, required=True, type='multi_label_selection', labels, visible_labels='undefined')#

Schema for the FeedbackDataset label questions, which are the ones that will require a label response from the user. This class should be used when the user can select multiple labels.

Parameters:
  • type (Literal[<QuestionTypes.multi_label_selection: 'multi_label_selection'>]) โ€“ The type of the question. Defaults to โ€˜multi_label_selectionโ€™ and cannot/shouldnโ€™t be modified.

  • labels (Union[ConstrainedListValue[str], Dict[str, str]]) โ€“ The list of labels of the label question. The labels must be unique, and the list must contain at least two unique labels. Additionally, labels can also be a dictionary of labels, where the keys are the labels, and the values are the labels that will be shown in the UI.

  • visible_labels (Optional[Union[Literal['undefined'], ~argilla.client.feedback.schemas.questions.ConstrainedIntValue]]) โ€“ The number of visible labels in the UI. Defaults to 20, and must be 3 or greater.

  • name (str) โ€“

  • title (Optional[str]) โ€“

  • description (Optional[str]) โ€“

  • required (bool) โ€“

Examples

>>> from argilla.client.feedback.schemas.questions import MultiLabelQuestion
>>> MultiLabelQuestion(name="multi_label_question", title="Multi Label Question", labels=["label_1", "label_2"])
class argilla.client.feedback.schemas.questions.QuestionSchema(*, name, title=None, description=None, required=True, type)#

Base schema for the FeedbackDataset questions. Which means that all the questions in the dataset will have at least these fields.

Parameters:
  • name (str) โ€“ The name of the question. This is the only required field.

  • title (Optional[str]) โ€“ The title of the question. If not provided, it will be capitalized from the name field. And its what will be shown in the UI.

  • description (Optional[str]) โ€“ The description of the question. Defaults to None, and is not shown in the UI, otherwise, it will be shown in the tooltip close to each question.

  • required (bool) โ€“ Whether the question is required or not. Defaults to True. Note that at least one question must be required.

  • type (Optional[QuestionTypes]) โ€“ The type of the question. Defaults to None, and ideally it should be defined in the class inheriting from this one to be able to use a discriminated union based on the type field.

Disclaimer:

You should not use this class directly, but instead use the classes that inherit from this one, as they will have the type field already defined, and ensured to be supported by Argilla.

response(value)#

Method that will be used to create a response from the question and a value.

Parameters:

value (Union[StrictStr, StrictInt, List[str], List[dict], List[RankingValueSchema], List[SpanValueSchema]]) โ€“

Return type:

Dict[str, ValueSchema]

abstract property server_settings: Dict[str, Any]#

Abstract property that should be implemented by the classes that inherit from this one, and that will be used to create the FeedbackDataset in Argilla.

suggestion(value, **kwargs)#

Method that will be used to create a SuggestionSchema from the question and a suggested value.

Parameters:

value (Union[StrictStr, StrictInt, List[str], List[dict], List[RankingValueSchema], List[SpanValueSchema]]) โ€“

Return type:

SuggestionSchema

to_server_payload()#

Method that will be used to create the payload that will be sent to Argilla to create a field in the FeedbackDataset.

Return type:

Dict[str, Any]

class argilla.client.feedback.schemas.questions.RankingQuestion(*, name, title=None, description=None, required=True, type='ranking', values)#

Schema for the FeedbackDataset ranking questions, which are the ones that will require a ranking response from the user. More specifically, the user will be asked to rank the labels, all the labels need to be assigned (if either the question is required or if at least one label has been ranked), and there can be ties/draws.

Parameters:
  • type (Literal[<QuestionTypes.ranking: 'ranking'>]) โ€“ The type of the question. Defaults to โ€˜rankingโ€™ and cannot/shouldnโ€™t be modified.

  • values (Union[ConstrainedListValue[str], Dict[str, str]]) โ€“ The list of labels of the ranking question. The labels must be unique, and the list must contain at least two unique labels. Additionally, labels can also be a dictionary of labels, where the keys are the labels, and the values are the labels that will be shown in the UI.

  • name (str) โ€“

  • title (Optional[str]) โ€“

  • description (Optional[str]) โ€“

  • required (bool) โ€“

Examples

>>> from argilla.client.feedback.schemas.questions import RankingQuestion
>>> RankingQuestion(name="ranking_question", title="Ranking Question", values=["label_1", "label_2"])
property server_settings: Dict[str, Any]#

Abstract property that should be implemented by the classes that inherit from this one, and that will be used to create the FeedbackDataset in Argilla.

class argilla.client.feedback.schemas.questions.RatingQuestion(*, name, title=None, description=None, required=True, type='rating', values)#

Schema for the FeedbackDataset rating questions, which are the ones that will require a rating response from the user.

Parameters:
  • type (Literal[<QuestionTypes.rating: 'rating'>]) โ€“ The type of the question. Defaults to โ€˜ratingโ€™ and cannot/shouldnโ€™t be modified.

  • values (List[int]) โ€“ The list of integer values of the rating question. There is not need for the values to be sequential, but they must be unique, contain at least two unique integers in the range [1, 10].

  • name (str) โ€“

  • title (Optional[str]) โ€“

  • description (Optional[str]) โ€“

  • required (bool) โ€“

Examples

>>> from argilla.client.feedback.schemas.questions import RatingQuestion
>>> RatingQuestion(name="rating_question", title="Rating Question", values=[1, 2, 3, 4, 5])
property server_settings: Dict[str, Any]#

Abstract property that should be implemented by the classes that inherit from this one, and that will be used to create the FeedbackDataset in Argilla.

class argilla.client.feedback.schemas.questions.TextQuestion(*, name, title=None, description=None, required=True, type='text', use_markdown=False)#

Schema for the FeedbackDataset text questions, which are the ones that will require a text response from the user.

Parameters:
  • type (Literal[<QuestionTypes.text: 'text'>]) โ€“ The type of the question. Defaults to โ€˜textโ€™ and cannot/shouldnโ€™t be modified.

  • use_markdown (bool) โ€“ Whether the question should be rendered using markdown or not. Defaults to False.

  • name (str) โ€“

  • title (Optional[str]) โ€“

  • description (Optional[str]) โ€“

  • required (bool) โ€“

Examples

>>> from argilla.client.feedback.schemas.questions import TextQuestion
>>> TextQuestion(name="text_question", title="Text Question")
property server_settings: Dict[str, Any]#

Abstract property that should be implemented by the classes that inherit from this one, and that will be used to create the FeedbackDataset in Argilla.

class argilla.client.feedback.schemas.fields.FieldSchema(*, name, title=None, required=True, type)#

Base schema for the FeedbackDataset fields.

Parameters:
  • name (str) โ€“ The name of the field. This is the only required field.

  • title (Optional[str]) โ€“ The title of the field. If not provided, it will be capitalized from the name field. And its what will be shown in the UI.

  • required (bool) โ€“ Whether the field is required or not. Defaults to True. Note that at least one field must be required.

  • type (Optional[FieldTypes]) โ€“ The type of the field. Defaults to None, and ideally it should be defined in the class inheriting from this one to be able to use a discriminated union based on the type field.

Disclaimer:

You should not use this class directly, but instead use the classes that inherit from this one, as they will have the type field already defined, and ensured to be supported by Argilla.

abstract property server_settings: Dict[str, Any]#

Abstract property that should be implemented by the classes that inherit from this one, and that will be used to create the FeedbackDataset in Argilla.

to_server_payload()#

Method that will be used to create the payload that will be sent to Argilla to create a field in the FeedbackDataset.

Return type:

Dict[str, Any]

class argilla.client.feedback.schemas.fields.TextField(*, name, title=None, required=True, type='text', use_markdown=False)#

Schema for the FeedbackDataset text fields, which are the ones that will require a text to be defined as part of the record.

Parameters:
  • type (Literal[<FieldTypes.text: 'text'>]) โ€“ The type of the field. Defaults to โ€˜textโ€™ and cannot/shouldnโ€™t be modified.

  • use_markdown (bool) โ€“ Whether the field should be rendered using markdown or not. Defaults to False.

  • name (str) โ€“

  • title (Optional[str]) โ€“

  • required (bool) โ€“

Examples

>>> from argilla.client.feedback.schemas.fields import TextField
>>> TextField(name="text_field", title="Text Field")
property server_settings: Dict[str, Any]#

Abstract property that should be implemented by the classes that inherit from this one, and that will be used to create the FeedbackDataset in Argilla.

class argilla.client.feedback.schemas.records.FeedbackRecord(*, fields, metadata=None, vectors=None, responses=None, suggestions=None, external_id=None)#

Schema for the records of a FeedbackDataset.

Parameters:
  • fields (Dict[str, Optional[str]]) โ€“ Fields that match the FeedbackDataset defined fields. So this attribute contains the actual information shown in the UI for each record, being the record itself.

  • metadata (Dict[str, Any]) โ€“ Metadata to be included to enrich the information for a given record. Note that the metadata is not shown in the UI so youโ€™ll just be able to see that programmatically after pulling the records. Defaults to None.

  • responses (List[ResponseSchema]) โ€“ Responses given by either the current user, or one or a collection of users that must exist in Argilla. Each response corresponds to one of the FeedbackDataset questions, so the values should match the question type. Defaults to None.

  • suggestions (Union[Tuple[SuggestionSchema], List[SuggestionSchema]]) โ€“ A list of SuggestionSchema that contains the suggestions for the current record. Every suggestion is linked to only one question. Defaults to an empty list.

  • external_id (Optional[str]) โ€“ The external ID of the record, which means that the user can specify this ID to identify the record no matter what the Argilla ID is. Defaults to None.

  • vectors (Dict[str, List[float]]) โ€“

Examples

>>> from argilla.feedback import FeedbackRecord, ResponseSchema, SuggestionSchema, ValueSchema
>>> FeedbackRecord(
...     fields={"text": "This is the first record", "label": "positive"},
...     metadata={"first": True, "nested": {"more": "stuff"}},
...     responses=[ # optional
...         ResponseSchema(
...             user_id="user-1",
...             values={
...                 "question-1": ValueSchema(value="This is the first answer"),
...                 "question-2": ValueSchema(value=5),
...             },
...             status="submitted",
...         ),
...     ],
...     suggestions=[ # optional
...         SuggestionSchema(
...            question_name="question-1",
...            type="model",
...            score=0.9,
...            value="This is the first suggestion",
...            agent="agent-1",
...         ),
...     ],
...     external_id="entry-1",
... )
to_server_payload(question_name_to_id=None)#

Method that will be used to create the payload that will be sent to Argilla to create a FeedbackRecord in the FeedbackDataset.

Parameters:

question_name_to_id (Optional[Dict[str, UUID]]) โ€“

Return type:

Dict[str, Any]

property unified_responses: Optional[Dict[str, List[UnifiedValueSchema]]]#

Property that returns the unified responses for the record.

class argilla.client.feedback.schemas.records.RankingValueSchema(*, value, rank=None)#

Schema for the RankingQuestion response value for a RankingQuestion. Note that we may have more than one record in the same rank.

Parameters:
  • value (StrictStr) โ€“ The value of the record.

  • rank (Optional[ConstrainedIntValue]) โ€“ The rank of the record.

class argilla.client.feedback.schemas.records.ResponseSchema(*, user_id=None, values=None, status=ResponseStatus.submitted)#

Schema for the FeedbackRecord response.

Parameters:
  • user_id (Optional[UUID]) โ€“ ID of the user that provided the response. Defaults to None, and is automatically fulfilled internally once the question is pushed to Argilla.

  • values (Optional[Union[List[Dict[str, ValueSchema]], Dict[str, ValueSchema]]]) โ€“ Values of the response, should match the questions in the record.

  • status (Union[ResponseStatus, str]) โ€“ Status of the response. Defaults to submitted.

Examples

>>> from argilla.client.feedback.schemas.responses import ResponseSchema, ValueSchema
>>> ResponseSchema(
...     values={
...         "question_1": ValueSchema(value="answer_1"),
...         "question_2": ValueSchema(value="answer_2"),
...     }
... )
to_server_payload()#

Method that will be used to create the payload that will be sent to Argilla to create a ResponseSchema for a FeedbackRecord.

Return type:

Dict[str, Any]

class argilla.client.feedback.schemas.records.SuggestionSchema(*, question_name, value, score=None, type=None, agent=None)#

Schema for the suggestions for the questions related to the record.

Parameters:
  • question_name (str) โ€“ name of the question in the FeedbackDataset.

  • type (Optional[Literal['model', 'human']]) โ€“ type of the question. Defaults to None. Possible values are model or human.

  • score (Optional[ConstrainedFloatValue]) โ€“ score of the suggestion. Defaults to None.

  • value (Union[StrictStr, StrictInt, List[str], List[dict], List[RankingValueSchema], List[SpanValueSchema]]) โ€“ value of the suggestion, which should match the type of the question.

  • agent (Optional[str]) โ€“ agent that generated the suggestion. Defaults to None.

Examples

>>> from argilla.client.feedback.schemas.suggestions import SuggestionSchema
>>> SuggestionSchema(
...     question_name="question-1",
...     type="model",
...     score=0.9,
...     value="This is the first suggestion",
...     agent="agent-1",
... )
to_server_payload(question_name_to_id)#

Method that will be used to create the payload that will be sent to Argilla to create a SuggestionSchema for a FeedbackRecord.

Parameters:

question_name_to_id (Dict[str, UUID]) โ€“

Return type:

Dict[str, Any]

class argilla.client.feedback.schemas.records.ValueSchema(*, value)#

Schema for any FeedbackRecord response value.

Parameters:

value (Union[StrictStr, StrictInt, List[str], List[dict], List[RankingValueSchema], List[SpanValueSchema]]) โ€“ The value of the record.