Client#
Here we describe the Python client of Argilla that we divide into three basic modules:
Methods: These methods make up the interface to interact with Argilla’s REST API.
Records: You need to wrap your data in these Records for Argilla to understand it.
Datasets: Datasets: You can wrap your records around these Datasets for extra functionality.
FeedbackDataset: FeedbackDataset: the dataset format for FeedbackTask and LLM support.
Methods#
- argilla.active_client()#
Returns the active argilla client.
If Active client is None, initialize a default one.
- Return type
argilla.client.client.Argilla
- argilla.copy(dataset, name_of_copy, workspace=None)#
Creates a copy of a dataset including its tags and metadata
- Parameters
dataset (str) – Name of the source dataset
name_of_copy (str) – Name of the copied dataset
workspace (Optional[str]) – If provided, dataset will be copied to that workspace
Examples
>>> import argilla as rg >>> rg.copy("my_dataset", name_of_copy="new_dataset") >>> rg.load("new_dataset")
- argilla.delete(name, workspace=None)#
Deletes a dataset.
- Parameters
name (str) – The dataset name.
workspace (Optional[str]) – The workspace to which records will be logged/loaded. If None (default) and the env variable
ARGILLA_WORKSPACE
is not set, it will default to the private user workspace.
Examples
>>> import argilla as rg >>> rg.delete(name="example-dataset")
- argilla.delete_records(name, workspace=None, query=None, ids=None, discard_only=False, discard_when_forbidden=True)#
Delete records from a argilla dataset.
- Parameters
name (str) – The dataset name.
workspace (Optional[str]) – The workspace to which records will be logged/loaded. If None (default) and the env variable
ARGILLA_WORKSPACE
is not set, it will default to the private user workspace.query (Optional[str]) – An ElasticSearch query with the query string syntax
ids (Optional[List[Union[str, int]]]) – If provided, deletes dataset records with given ids.
discard_only (bool) – If True, matched records won’t be deleted. Instead, they will be marked as Discarded
discard_when_forbidden (bool) – Only super-user or dataset creator can delete records from a dataset. So, running “hard” deletion for other users will raise an ForbiddenApiError error. If this parameter is True, the client API will automatically try to mark as
Discarded
records instead. Default, True
- Returns
The total of matched records and real number of processed errors. These numbers could not be the same if some data conflicts are found during operations (some matched records change during deletion).
- Return type
Tuple[int, int]
Examples
>>> ## Delete by id >>> import argilla as rg >>> rg.delete_records(name="example-dataset", ids=[1,3,5]) >>> ## Discard records by query >>> import argilla as rg >>> rg.delete_records(name="example-dataset", query="metadata.code=33", discard_only=True)
- argilla.get_workspace()#
Returns the name of the active workspace.
- Returns
The name of the active workspace as a string.
- Return type
str
- argilla.init(api_url=None, api_key=None, workspace=None, timeout=60, extra_headers=None)#
Init the Python client.
We will automatically init a default client for you when calling other client methods. The arguments provided here will overwrite your corresponding environment variables.
- Parameters
api_url (Optional[str]) – Address of the REST API. If None (default) and the env variable
ARGILLA_API_URL
is not set, it will default to http://localhost:6900.api_key (Optional[str]) – Authentification key for the REST API. If None (default) and the env variable
ARGILLA_API_KEY
is not set, it will default to argilla.apikey.workspace (Optional[str]) – The workspace to which records will be logged/loaded. If None (default) and the env variable
ARGILLA_WORKSPACE
is not set, it will default to the private user workspace.timeout (int) – Wait timeout seconds for the connection to timeout. Default: 60.
extra_headers (Optional[Dict[str, str]]) – Extra HTTP headers sent to the server. You can use this to customize the headers of argilla client requests, like additional security restrictions. Default: None.
Examples
>>> import argilla as rg >>> >>> rg.init(api_url="http://localhost:9090", api_key="4AkeAPIk3Y") >>> # Customizing request headers >>> headers = {"X-Client-id":"id","X-Secret":"secret"} >>> rg.init(api_url="http://localhost:9090", api_key="4AkeAPIk3Y", extra_headers=headers)
- argilla.load(name, workspace=None, query=None, vector=None, ids=None, limit=None, sort=None, id_from=None, batch_size=250, include_vectors=True, include_metrics=True, as_pandas=None)#
Loads a argilla dataset.
- Parameters
name (str) – The dataset name.
workspace (Optional[str]) – The workspace to which records will be logged/loaded. If None (default) and the env variable
ARGILLA_WORKSPACE
is not set, it will default to the private user workspace.query (Optional[str]) –
An ElasticSearch query with the query string syntax
vector (Optional[Tuple[str, List[float]]]) – Vector configuration for a semantic search
ids (Optional[List[Union[str, int]]]) – If provided, load dataset records with given ids.
limit (Optional[int]) – The number of records to retrieve.
sort (Optional[List[Tuple[str, str]]]) – The fields on which to sort [(<field_name>, ‘asc|decs’)].
id_from (Optional[str]) – If provided, starts gathering the records starting from that Record. As the Records returned with the load method are sorted by ID, ´id_from´ can be used to load using batches.
batch_size (int) – If provided, load batch_size samples per request. A lower batch size may help avoid timeouts.
include_vectors (bool) – When set to False, indicates that records will be retrieved excluding their vectors, if any. By default, this parameter is set to True, meaning that vectors will be included.
include_metrics (bool) – When set to False, indicates that records will be retrieved excluding their metrics. By default, this parameter is set to True, meaning that metrics will be included.
as_pandas – DEPRECATED! To get a pandas DataFrame do
rg.load('my_dataset').to_pandas()
.
- Returns
A argilla dataset.
- Return type
Union[argilla.client.datasets.DatasetForTextClassification, argilla.client.datasets.DatasetForTokenClassification, argilla.client.datasets.DatasetForText2Text]
Examples
Basic Loading: load the samples sorted by their ID
>>> import argilla as rg >>> dataset = rg.load(name="example-dataset")
- Iterate over a large dataset:
When dealing with a large dataset you might want to load it in batches to optimize memory consumption and avoid network timeouts. To that end, a simple batch-iteration over the whole database can be done employing the from_id parameter. This parameter will act as a delimiter, retrieving the N items after the given id, where N is determined by the limit parameter. NOTE If no limit is given the whole dataset after that ID will be retrieved.
>>> import argilla as rg >>> dataset_batch_1 = rg.load(name="example-dataset", limit=1000) >>> dataset_batch_2 = rg.load(name="example-dataset", limit=1000, id_from=dataset_batch_1[-1].id)
- argilla.log(records, name, workspace=None, tags=None, metadata=None, batch_size=100, verbose=True, background=False, chunk_size=None, num_threads=0, max_retries=3)#
Logs Records to argilla.
The logging happens asynchronously in a background thread.
- Parameters
records (Union[argilla.client.models.TextClassificationRecord, argilla.client.models.TokenClassificationRecord, argilla.client.models.Text2TextRecord, argilla.client.models.TextGenerationRecord, Iterable[Union[argilla.client.models.TextClassificationRecord, argilla.client.models.TokenClassificationRecord, argilla.client.models.Text2TextRecord, argilla.client.models.TextGenerationRecord]], argilla.client.datasets.DatasetForTextClassification, argilla.client.datasets.DatasetForTokenClassification, argilla.client.datasets.DatasetForText2Text]) – The record, an iterable of records, or a dataset to log.
name (str) – The dataset name.
workspace (Optional[str]) – The workspace to which records will be logged/loaded. If None (default) and the env variable
ARGILLA_WORKSPACE
is not set, it will default to the private user workspace.tags (Optional[Dict[str, str]]) – A dictionary of tags related to the dataset.
metadata (Optional[Dict[str, Any]]) – A dictionary of extra info for the dataset.
batch_size (int) – The batch size for a data bulk.
verbose (bool) – If True, shows a progress bar and prints out a quick summary at the end.
background (bool) – If True, we will NOT wait for the logging process to finish and return an
asyncio.Future
object. You probably want to setverbose
to False in that case.chunk_size (Optional[int]) – DEPRECATED! Use batch_size instead.
num_threads (int) – If > 0, will use num_thread separate number threads to batches, sending data concurrently. Default to 0, which means no threading at all.
max_retries (int) – Number of retries when logging a batch of records if a httpx.TransportError occurs. Default 3.
- Returns
Summary of the response from the REST API. If the
background
argument is set to True, anasyncio.Future
will be returned instead.- Return type
Union[argilla.client.models.BulkResponse, _asyncio.Future]
Examples
>>> import argilla as rg >>> record = rg.TextClassificationRecord( ... text="my first argilla example", ... prediction=[('spam', 0.8), ('ham', 0.2)] ... ) >>> rg.log(record, name="example-dataset") 1 records logged to http://localhost:6900/datasets/argilla/example-dataset BulkResponse(dataset='example-dataset', processed=1, failed=0) >>> >>> # Logging records in the background >>> rg.log(record, name="example-dataset", background=True, verbose=False) <Future at 0x7f675a1fffa0 state=pending>
- argilla.set_workspace(workspace)#
Sets the active workspace.
- Parameters
workspace (str) – The new workspace
Records#
This module contains the data models for the interface
- class argilla.client.models.Framework(value)#
An enumeration.
- class argilla.client.models.Text2TextRecord(*, text, prediction=None, prediction_agent=None, annotation=None, annotation_agent=None, vectors=None, id=None, metadata=None, status=None, event_timestamp=None, metrics=None, search_keywords=None)#
Record for a text to text task
- Parameters
text (str) – The input of the record
prediction (Optional[List[Union[str, Tuple[str, float]]]]) – A list of strings or tuples containing predictions for the input text. If tuples, the first entry is the predicted text, the second entry is its corresponding score.
prediction_agent (Optional[str]) – Name of the prediction agent. By default, this is set to the hostname of your machine.
annotation (Optional[str]) – A string representing the expected output text for the given input text.
annotation_agent (Optional[str]) – Name of the prediction agent. By default, this is set to the hostname of your machine.
vectors (Optional[Dict[str, List[float]]]) – Embedding data mappings of the natural language text containing class attributes’
id (Optional[Union[int, str]]) – The id of the record. By default (None), we will generate a unique ID for you.
metadata (Optional[Dict[str, Any]]) – Metadata for the record. Defaults to {}.
status (Optional[str]) – The status of the record. Options: ‘Default’, ‘Edited’, ‘Discarded’, ‘Validated’. If an annotation is provided, this defaults to ‘Validated’, otherwise ‘Default’.
event_timestamp (Optional[datetime.datetime]) – The timestamp for the creation of the record. Defaults to datetime.datetime.now().
metrics (Optional[Dict[str, Any]]) – READ ONLY! Metrics at record level provided by the server when using rg.load. This attribute will be ignored when using rg.log.
search_keywords (Optional[List[str]]) – READ ONLY! Relevant record keywords/terms for provided query when using rg.load. This attribute will be ignored when using rg.log.
- Return type
None
Examples
>>> import argilla as rg >>> record = rg.Text2TextRecord( ... text="My name is Sarah and I love my dog.", ... prediction=["Je m'appelle Sarah et j'aime mon chien."], ... vectors = { ... "bert_base_uncased": [1.2, 2.3, 3.4, 5.2, 6.5], ... "xlm_multilingual_uncased": [2.2, 5.3, 5.4, 3.2, 2.5] ... } ... )
- classmethod prediction_as_tuples(prediction)#
Preprocess the predictions and wraps them in a tuple if needed
- Parameters
prediction (Optional[List[Union[str, Tuple[str, float]]]]) –
- class argilla.client.models.TextClassificationRecord(*, text=None, inputs=None, prediction=None, prediction_agent=None, annotation=None, annotation_agent=None, vectors=None, multi_label=False, explanation=None, id=None, metadata=None, status=None, event_timestamp=None, metrics=None, search_keywords=None)#
Record for text classification
- Parameters
text (Optional[str]) – The input of the record. Provide either ‘text’ or ‘inputs’.
inputs (Optional[Union[str, List[str], Dict[str, Union[str, List[str]]]]]) – Various inputs of the record (see examples below). Provide either ‘text’ or ‘inputs’.
prediction (Optional[List[Tuple[str, float]]]) – A list of tuples containing the predictions for the record. The first entry of the tuple is the predicted label, the second entry is its corresponding score.
prediction_agent (Optional[str]) – Name of the prediction agent. By default, this is set to the hostname of your machine.
annotation (Optional[Union[str, List[str]]]) – A string or a list of strings (multilabel) corresponding to the annotation (gold label) for the record.
annotation_agent (Optional[str]) – Name of the prediction agent. By default, this is set to the hostname of your machine.
vectors (Optional[Dict[str, List[float]]]) – Vectors data mappings of the natural language text containing class attributes
multi_label (bool) – Is the prediction/annotation for a multi label classification task? Defaults to False.
explanation (Optional[Dict[str, List[argilla.client.models.TokenAttributions]]]) – A dictionary containing the attributions of each token to the prediction. The keys map the input of the record (see inputs) to the TokenAttributions.
id (Optional[Union[int, str]]) – The id of the record. By default (None), we will generate a unique ID for you.
metadata (Optional[Dict[str, Any]]) – Metadata for the record. Defaults to {}.
status (Optional[str]) – The status of the record. Options: ‘Default’, ‘Edited’, ‘Discarded’, ‘Validated’. If an annotation is provided, this defaults to ‘Validated’, otherwise ‘Default’.
event_timestamp (Optional[datetime.datetime]) – The timestamp for the creation of the record. Defaults to datetime.datetime.now().
metrics (Optional[Dict[str, Any]]) – READ ONLY! Metrics at record level provided by the server when using rg.load. This attribute will be ignored when using rg.log.
search_keywords (Optional[List[str]]) – READ ONLY! Relevant record keywords/terms for provided query when using rg.load. This attribute will be ignored when using rg.log.
- Return type
None
Examples
>>> # Single text input >>> import argilla as rg >>> record = rg.TextClassificationRecord( ... text="My first argilla example", ... prediction=[('eng', 0.9), ('esp', 0.1)], ... vectors = { ... "english_bert_vector": [1.2, 2.3, 3.1, 3.3] ... } ... ) >>> >>> # Various inputs >>> record = rg.TextClassificationRecord( ... inputs={ ... "subject": "Has ganado 1 million!", ... "body": "Por usar argilla te ha tocado este premio: <link>" ... }, ... prediction=[('spam', 0.99), ('ham', 0.01)], ... annotation="spam", ... vectors = { ... "distilbert_uncased": [1.13, 4.1, 6.3, 4.2, 9.1], ... "xlm_roberta_cased": [1.1, 2.1, 3.3, 4.2, 2.1], ... } ... )
- class argilla.client.models.TextGenerationRecord(*, text, prediction=None, prediction_agent=None, annotation=None, annotation_agent=None, vectors=None, id=None, metadata=None, status=None, event_timestamp=None, metrics=None, search_keywords=None)#
- Parameters
text (str) –
prediction (Optional[List[Union[str, Tuple[str, float]]]]) –
prediction_agent (Optional[str]) –
annotation (Optional[str]) –
annotation_agent (Optional[str]) –
vectors (Optional[Dict[str, List[float]]]) –
id (Optional[Union[int, str]]) –
metadata (Optional[Dict[str, Any]]) –
status (Optional[str]) –
event_timestamp (Optional[datetime.datetime]) –
metrics (Optional[Dict[str, Any]]) –
search_keywords (Optional[List[str]]) –
- Return type
None
- class argilla.client.models.TokenAttributions(*, token, attributions=None)#
Attribution of the token to the predicted label.
In the argilla app this is only supported for
TextClassificationRecord
and themulti_label=False
case.- Parameters
token (str) – The input token.
attributions (Dict[str, float]) – A dictionary containing label-attribution pairs.
- Return type
None
- class argilla.client.models.TokenClassificationRecord(text=None, tokens=None, tags=None, *, prediction=None, prediction_agent=None, annotation=None, annotation_agent=None, vectors=None, id=None, metadata=None, status=None, event_timestamp=None, metrics=None, search_keywords=None)#
Record for a token classification task
- Parameters
text (Optional[str]) – The input of the record
tokens (Optional[Union[List[str], Tuple[str, ...]]]) – The tokenized input of the record. We use this to guide the annotation process and to cross-check the spans of your prediction/annotation.
prediction (Optional[List[Union[Tuple[str, int, int], Tuple[str, int, int, Optional[float]]]]]) – A list of tuples containing the predictions for the record. The first entry of the tuple is the name of predicted entity, the second and third entry correspond to the start and stop character index of the entity. The fourth entry is optional and corresponds to the score of the entity (a float number between 0 and 1).
prediction_agent (Optional[str]) – Name of the prediction agent. By default, this is set to the hostname of your machine.
annotation (Optional[List[Tuple[str, int, int]]]) – A list of tuples containing annotations (gold labels) for the record. The first entry of the tuple is the name of the entity, the second and third entry correspond to the start and stop char index of the entity.
annotation_agent (Optional[str]) – Name of the prediction agent. By default, this is set to the hostname of your machine.
vectors (Optional[Dict[str, List[float]]]) – Vector data mappings of the natural language text containing class attributes’
id (Optional[Union[int, str]]) – The id of the record. By default (None), we will generate a unique ID for you.
metadata (Optional[Dict[str, Any]]) – Metadata for the record. Defaults to {}.
status (Optional[str]) – The status of the record. Options: ‘Default’, ‘Edited’, ‘Discarded’, ‘Validated’. If an annotation is provided, this defaults to ‘Validated’, otherwise ‘Default’.
event_timestamp (Optional[datetime.datetime]) – The timestamp for the creation of the record. Defaults to datetime.datetime.now().
metrics (Optional[Dict[str, Any]]) – READ ONLY! Metrics at record level provided by the server when using rg.load. This attribute will be ignored when using rg.log.
search_keywords (Optional[List[str]]) – READ ONLY! Relevant record keywords/terms for provided query when using rg.load. This attribute will be ignored when using rg.log.
tags (Optional[List[str]]) –
- Return type
None
Examples
>>> import argilla as rg >>> record = rg.TokenClassificationRecord( ... text = "Michael is a professor at Harvard", ... tokens = ["Michael", "is", "a", "professor", "at", "Harvard"], ... prediction = [('NAME', 0, 7), ('LOC', 26, 33)], ... vectors = { ... "bert_base_uncased": [3.2, 4.5, 5.6, 8.9] ... } ... ] ... )
- char_id2token_id(char_idx)#
DEPRECATED, please use the
argilla.utisl.span_utils.SpanUtils.char_to_token_idx
dict instead.- Parameters
char_idx (int) –
- Return type
Optional[int]
- spans2iob(spans=None)#
DEPRECATED, please use the
argilla.utils.SpanUtils.to_tags()
method.- Parameters
spans (Optional[List[Tuple[str, int, int]]]) –
- Return type
Optional[List[str]]
- token_span(token_idx)#
DEPRECATED, please use the
argilla.utisl.span_utils.SpanUtils.token_to_char_idx
dict instead.- Parameters
token_idx (int) –
- Return type
Tuple[int, int]
Datasets#
- class argilla.client.datasets.DatasetForText2Text(records=None)#
This Dataset contains Text2TextRecord records.
It allows you to export/import records into/from different formats, loop over the records, and access them by index.
- Parameters
records (Optional[List[argilla.client.models.Text2TextRecord]]) – A list of `Text2TextRecord`s.
- Raises
WrongRecordTypeError – When the record type in the provided list does not correspond to the dataset type.
Examples
>>> # Import/export records: >>> import argilla as rg >>> dataset = rg.DatasetForText2Text.from_pandas(my_dataframe) >>> dataset.to_datasets() >>> >>> # Passing in a list of records: >>> records = [ ... rg.Text2TextRecord(text="example"), ... rg.Text2TextRecord(text="another example"), ... ] >>> dataset = rg.DatasetForText2Text(records) >>> assert len(dataset) == 2 >>> >>> # Looping over the dataset: >>> for record in dataset: ... print(record) >>> >>> # Indexing into the dataset: >>> dataset[0] ... rg.Text2TextRecord(text="example"}) >>> dataset[0] = rg.Text2TextRecord(text="replaced example")
- classmethod from_datasets(dataset, text=None, annotation=None, metadata=None, id=None)#
Imports records from a datasets.Dataset.
Columns that are not supported are ignored.
- Parameters
dataset (datasets.Dataset) – A datasets Dataset from which to import the records.
text (Optional[str]) – The field name used as record text. Default: None
annotation (Optional[str]) – The field name used as record annotation. Default: None
metadata (Optional[Union[str, List[str]]]) – The field name used as record metadata. Default: None
id (Optional[str]) –
- Returns
The imported records in a argilla Dataset.
- Return type
Examples
>>> import datasets >>> ds = datasets.Dataset.from_dict({ ... "text": ["my example"], ... "prediction": [["mi ejemplo", "ejemplo mio"]] ... }) >>> # or >>> ds = datasets.Dataset.from_dict({ ... "text": ["my example"], ... "prediction": [[{"text": "mi ejemplo", "score": 0.9}]] ... }) >>> DatasetForText2Text.from_datasets(ds)
- classmethod from_pandas(dataframe)#
Imports records from a pandas.DataFrame.
Columns that are not supported are ignored.
- Parameters
dataframe (pandas.core.frame.DataFrame) – A pandas DataFrame from which to import the records.
- Returns
The imported records in a argilla Dataset.
- Return type
- class argilla.client.datasets.DatasetForTextClassification(records=None)#
This Dataset contains TextClassificationRecord records.
It allows you to export/import records into/from different formats, loop over the records, and access them by index.
- Parameters
records (Optional[List[argilla.client.models.TextClassificationRecord]]) – A list of `TextClassificationRecord`s.
- Raises
WrongRecordTypeError – When the record type in the provided list does not correspond to the dataset type.
Examples
>>> # Import/export records: >>> import argilla as rg >>> dataset = rg.DatasetForTextClassification.from_pandas(my_dataframe) >>> dataset.to_datasets() >>> >>> # Looping over the dataset: >>> for record in dataset: ... print(record) >>> >>> # Passing in a list of records: >>> records = [ ... rg.TextClassificationRecord(text="example"), ... rg.TextClassificationRecord(text="another example"), ... ] >>> dataset = rg.DatasetForTextClassification(records) >>> assert len(dataset) == 2 >>> >>> # Indexing into the dataset: >>> dataset[0] ... rg.TextClassificationRecord(text="example") >>> dataset[0] = rg.TextClassificationRecord(text="replaced example")
- classmethod from_datasets(dataset, text=None, id=None, inputs=None, annotation=None, metadata=None)#
Imports records from a datasets.Dataset.
Columns that are not supported are ignored.
- Parameters
dataset (datasets.Dataset) – A datasets Dataset from which to import the records.
text (Optional[str]) – The field name used as record text. Default: None
id (Optional[str]) – The field name used as record id. Default: None
inputs (Optional[Union[str, List[str]]]) – A list of field names used for record inputs. Default: None
annotation (Optional[str]) – The field name used as record annotation. Default: None
metadata (Optional[Union[str, List[str]]]) – The field name used as record metadata. Default: None
- Returns
The imported records in a argilla Dataset.
- Return type
Examples
>>> import datasets >>> ds = datasets.Dataset.from_dict({ ... "inputs": ["example"], ... "prediction": [ ... [{"label": "LABEL1", "score": 0.9}, {"label": "LABEL2", "score": 0.1}] ... ] ... }) >>> DatasetForTextClassification.from_datasets(ds)
- classmethod from_pandas(dataframe)#
Imports records from a pandas.DataFrame.
Columns that are not supported are ignored.
- Parameters
dataframe (pandas.core.frame.DataFrame) – A pandas DataFrame from which to import the records.
- Returns
The imported records in a argilla Dataset.
- Return type
- class argilla.client.datasets.DatasetForTokenClassification(records=None)#
This Dataset contains TokenClassificationRecord records.
It allows you to export/import records into/from different formats, loop over the records, and access them by index.
- Parameters
records (Optional[List[argilla.client.models.TokenClassificationRecord]]) – A list of `TokenClassificationRecord`s.
- Raises
WrongRecordTypeError – When the record type in the provided list does not correspond to the dataset type.
Examples
>>> # Import/export records: >>> import argilla as rg >>> dataset = rg.DatasetForTokenClassification.from_pandas(my_dataframe) >>> dataset.to_datasets() >>> >>> # Looping over the dataset: >>> assert len(dataset) == 2 >>> for record in dataset: ... print(record) >>> >>> # Passing in a list of records: >>> import argilla as rg >>> records = [ ... rg.TokenClassificationRecord(text="example", tokens=["example"]), ... rg.TokenClassificationRecord(text="another example", tokens=["another", "example"]), ... ] >>> dataset = rg.DatasetForTokenClassification(records) >>> >>> # Indexing into the dataset: >>> dataset[0] ... rg.TokenClassificationRecord(text="example", tokens=["example"]) >>> dataset[0] = rg.TokenClassificationRecord(text="replace example", tokens=["replace", "example"])
- classmethod from_datasets(dataset, text=None, id=None, tokens=None, tags=None, metadata=None)#
Imports records from a datasets.Dataset.
Columns that are not supported are ignored.
- Parameters
dataset (datasets.Dataset) – A datasets Dataset from which to import the records.
text (Optional[str]) – The field name used as record text. Default: None
id (Optional[str]) – The field name used as record id. Default: None
tokens (Optional[str]) – The field name used as record tokens. Default: None
tags (Optional[str]) – The field name used as record tags. Default: None
metadata (Optional[Union[str, List[str]]]) – The field name used as record metadata. Default: None
- Returns
The imported records in a argilla Dataset.
- Return type
Examples
>>> import datasets >>> ds = datasets.Dataset.from_dict({ ... "text": ["my example"], ... "tokens": [["my", "example"]], ... "prediction": [ ... [{"label": "LABEL1", "start": 3, "end": 10, "score": 1.0}] ... ] ... }) >>> DatasetForTokenClassification.from_datasets(ds)
- classmethod from_pandas(dataframe)#
Imports records from a pandas.DataFrame.
Columns that are not supported are ignored.
- Parameters
dataframe (pandas.core.frame.DataFrame) – A pandas DataFrame from which to import the records.
- Returns
The imported records in a argilla Dataset.
- Return type
- argilla.client.datasets.read_datasets(dataset, task, **kwargs)#
Reads a datasets Dataset and returns a argilla Dataset
Columns not supported by the
Record
instance corresponding with the task are ignored.- Parameters
dataset (datasets.Dataset) – Dataset to be read in.
task (Union[str, argilla.client.sdk.datasets.models.TaskType]) – Task for the dataset, one of: [“TextClassification”, “TokenClassification”, “Text2Text”].
**kwargs – Passed on to the task-specific
DatasetFor*.from_datasets()
method.
- Returns
A argilla dataset for the given task.
- Return type
Union[argilla.client.datasets.DatasetForTextClassification, argilla.client.datasets.DatasetForTokenClassification, argilla.client.datasets.DatasetForText2Text]
Examples
>>> # Read text classification records from a datasets Dataset >>> import datasets >>> ds = datasets.Dataset.from_dict({ ... "inputs": ["example"], ... "prediction": [ ... [{"label": "LABEL1", "score": 0.9}, {"label": "LABEL2", "score": 0.1}] ... ] ... }) >>> read_datasets(ds, task="TextClassification") >>> >>> # Read token classification records from a datasets Dataset >>> ds = datasets.Dataset.from_dict({ ... "text": ["my example"], ... "tokens": [["my", "example"]], ... "prediction": [ ... [{"label": "LABEL1", "start": 3, "end": 10}] ... ] ... }) >>> read_datasets(ds, task="TokenClassification") >>> >>> # Read text2text records from a datasets Dataset >>> ds = datasets.Dataset.from_dict({ ... "text": ["my example"], ... "prediction": [["mi ejemplo", "ejemplo mio"]] ... }) >>> # or >>> ds = datasets.Dataset.from_dict({ ... "text": ["my example"], ... "prediction": [[{"text": "mi ejemplo", "score": 0.9}]] ... }) >>> read_datasets(ds, task="Text2Text")
- argilla.client.datasets.read_pandas(dataframe, task)#
Reads a pandas DataFrame and returns a argilla Dataset
Columns not supported by the
Record
instance corresponding with the task are ignored.- Parameters
dataframe (pandas.core.frame.DataFrame) – Dataframe to be read in.
task (Union[str, argilla.client.sdk.datasets.models.TaskType]) – Task for the dataset, one of: [“TextClassification”, “TokenClassification”, “Text2Text”]
- Returns
A argilla dataset for the given task.
- Return type
Union[argilla.client.datasets.DatasetForTextClassification, argilla.client.datasets.DatasetForTokenClassification, argilla.client.datasets.DatasetForText2Text]
Examples
>>> # Read text classification records from a pandas DataFrame >>> import pandas as pd >>> df = pd.DataFrame({ ... "inputs": ["example"], ... "prediction": [ ... [("LABEL1", 0.9), ("LABEL2", 0.1)] ... ] ... }) >>> read_pandas(df, task="TextClassification") >>> >>> # Read token classification records from a datasets Dataset >>> df = pd.DataFrame({ ... "text": ["my example"], ... "tokens": [["my", "example"]], ... "prediction": [ ... [("LABEL1", 3, 10)] ... ] ... }) >>> read_pandas(df, task="TokenClassification") >>> >>> # Read text2text records from a datasets Dataset >>> df = pd.DataFrame({ ... "text": ["my example"], ... "prediction": [["mi ejemplo", "ejemplo mio"]] ... }) >>> # or >>> ds = pd.DataFrame({ ... "text": ["my example"], ... "prediction": [[("mi ejemplo", 0.9)]] ... }) >>> read_pandas(df, task="Text2Text")