πŸ’« Update a dataset#

Feedback Dataset#

Warning

The dataset class covered in this section is the FeedbackDataset. This fully configurable dataset will replace the DatasetForTextClassification, DatasetForTokenClassification, and DatasetForText2Text in Argilla 2.0. Not sure which dataset to use? Check out our section on choosing a dataset.

Oftentimes datasets that we have created previously need modifications or updates. In this section, we will explore some of the most common workflows to change an existing FeedbackDataset in Argilla.

Remember that you will need to connect to Argilla to perform any of the actions below.

rg.init(
    api_url="<ARGILLA_API_URL>",
    api_key="<ARGILLA_API_KEY>
)

Add records#

To add a FeedbackRecord and/or a list of FeedbackRecords to an existing dataset you will need to load the FeedbackDataset from Argilla first, calling FeedbackDataset.from_argilla, and then call the add_records method.

Note

From Argilla 1.14.0, calling from_argilla will pull the FeedbackDataset from Argilla, but the instance will be remote, which implies that the additions, updates, and deletions of records will be pushed to Argilla as soon as they are made. This is a change from previous versions of Argilla, where you had to call push_to_argilla again to push the changes to Argilla.

# load the dataset
dataset = rg.FeedbackDataset.from_argilla(name="my_dataset", workspace="my_workspace")
# list of Feedback records to add
new_records = [...]
# add records to the dataset
dataset.add_records(new_records)
# load the dataset
dataset = rg.FeedbackDataset.from_argilla(name="my_dataset", workspace="my_workspace")
# list of Feedback records to add
new_records = [...]
# add records to the dataset
dataset.add_records(new_records)
# push the dataset to Argilla
dataset.push_to_argilla()

To learn about the format that these records follow, check this page or go to our cheatsheet.

Delete existing records#

From v1.14.0, it is possible to delete records from a FeedbackDataset in Argilla. Remember that from 1.14.0, when pulling a FeedbackDataset from Argilla via the from_argilla method, the returned instance is a remote FeedbackDataset, which implies that all the additions, updates, and deletions are directly pushed to Argilla, without having to call push_to_argilla for those to be pushed to Argilla.

The first alternative is to call the delete method over a single FeedbackRecord in the dataset, which will delete that record from Argilla.

# load the dataset
dataset = rg.FeedbackDataset.from_argilla(name="my_dataset", workspace="my_workspace")
# delete a specific record
dataset.records[0].delete()

Otherwise, you can also select one or more records from the existing FeedbackDataset (which are FeedbackRecords in Argilla) and call the delete_records method to delete them from Argilla.

# load the dataset
dataset = rg.FeedbackDataset.from_argilla(name="my_dataset", workspace="my_workspace")
# delete a list of records from a dataset
dataset.delete_records(list(dataset.records[:5]))

Update existing records#

Add or update suggestions#

You can also add suggestions to records that have been already pushed to Argilla and from v1.14.0 update existing ones.

Note

From Argilla 1.14.0, calling from_argilla will pull the FeedbackDataset from Argilla, but the instance will be remote, which implies that the additions, updates, and deletions of records will be pushed to Argilla as soon as they are made. This is a change from previous versions of Argilla, where you had to call push_to_argilla again to push the changes to Argilla.

You can add or update existing suggestions from Argilla v1.14.0 using this method.

Note

If you include in this method a suggestion for a question that already has one, this will overwrite the previous suggestion.

# load the dataset
dataset = rg.FeedbackDataset.from_argilla(name="my_dataset", workspace="my_workspace")
# loop through the records and add suggestions
for record in dataset.records:
    record.update(suggestions=[...])

This method will only add suggestions to records that don’t have them. To update suggestions, upgrade to v1.14.0 or higher and follow the snippet in the other tab.

# load the dataset
dataset = rg.FeedbackDataset.from_argilla(name="my_dataset", workspace="my_workspace")
# loop through the records and add suggestions
for record in dataset.records:
    record.set_suggestions([...])
dataset.push_to_argilla()

To learn about the schema that these suggestions should follow check this page.

Delete suggestions#

From Argilla v1.15.0, you can also delete suggestions from existing records in Argilla via either the delete_suggestions method available for every record in Argilla, or via the delete method of every suggestion.

To delete some or all the suggestions from a FeedbackRecord pushed to Argilla, you can do the following:

dataset = rg.FeedbackDataset.from_argilla(name="my-dataset", workspace="my-workspace")

# Delete ALL the suggestions from a record in Argilla
for record in dataset.records:
    record.delete_suggestions(list(record.suggestions))

# Delete just the first 2 suggestions from a record in Argilla
for record in dataset.records:
    record.delete_suggestions(list(record.suggestions[:2]))

Or just delete a single suggestion from a record in Argilla:

dataset = rg.FeedbackDataset.from_argilla(name="my-dataset", workspace="my-workspace")

# Delete the first suggestion from a record in Argilla
dataset.records[0].suggestions[0].delete()

Other datasets#

Warning

The records classes covered in this section correspond to three datasets: DatasetForTextClassification, DatasetForTokenClassification, and DatasetForText2Text. These will be deprecated in Argilla 2.0 and replaced by the fully configurable FeedbackDataset class. Not sure which dataset to use? Check out our section on choosing a dataset.

Add records#

Records can be added to your dataset by logging them using the rg.log() function, just like you did when pushing records for the first time to Argilla (as explained here). If the records don’t exist already in the dataset, these will be added to it.

Delete existing records#

You can delete records by passing their id into the rg.delete_records() function or using a query that matches the records. Learn more here.

## Delete by id
import argilla as rg
rg.delete_records(name="example-dataset", ids=[1,3,5])
## Discard records by query
import argilla as rg
rg.delete_records(name="example-dataset", query="metadata.code=33", discard_only=True)

Update existing records#

It is possible to update records from your Argilla datasets using our Python API. This approach works the same way as an upsert in a normal database, based on the record id. You can update any arbitrary parameters and they will be over-written if you use the id of the original record.

import argilla as rg

# read all records in the dataset or define a specific search via the `query` parameter
record = rg.load("my_first_dataset")

# modify first record metadata (if no previous metadata dict you might need to create it)
record[0].metadata["my_metadata"] = "im a new value"

# log record to update it, this will keep everything but add my_metadata field and value
rg.log(name="my_first_dataset", records=record[0])