Export a Feedback Dataset#

Your Argilla instance will always have all your datasets and annotations saved and accessible. However, if youโ€™d like to save your dataset either locally or in the Hugging Face Hub, in this section you will find some useful methods to do just that.

Note

The methods mentioned in this page are only available for FeedbackDataset. For other types of datasets like TextClassificationDataset, TokenClassificationDataset and Text2TextDataset check this guide.

Pull from Argilla#

The first step will be to pull a dataset from Argilla with the FeedbackDataset.from_argilla() method. This method will return a new instance of FeedbackDataset with the same guidelines, fields, questions, and records (including responses if any) as the dataset in Argilla.

dataset = rg.FeedbackDataset.from_argilla("my-dataset", workspace="my-workspace")

At this point, you can do any post-processing you may need with this dataset e.g., unifying responses from multiple annotators. Once youโ€™re happy with the result, you can decide on some of the following options to save it.

Push back to Argilla#

You can always push the dataset back to Argilla in case you want to clone the dataset or explore it after post-processing.

dataset.push_to_argilla(name="my-dataset-clone", workspace="my-workspace")

Push to the Hugging Face Hub#

It is also possible to save and load a FeedbackDataset into the Hugging Face Hub for persistence. The methods push_to_huggingface and from_huggingface allow you to push to or pull from the Hugging Face Hub, respectively.

When pushing a FeedbackDataset to the HuggingFace Hub, one can provide the param generate_card to generate and push the Dataset Card too. generate_card is by default True, so it will always be generated unless generate_card=False is specified.

# Push to HuggingFace Hub
dataset.push_to_huggingface("argilla/my-dataset")

# Push to HuggingFace Hub as private
dataset.push_to_huggingface("argilla/my-dataset", private=True, token="...")

Note that the FeedbackDataset.push_to_huggingface() method uploads not just the dataset records, but also a configuration file named argilla.cfg, that contains the dataset configuration i.e. the fields, questions, and guidelines, if any. This way you can load any FeedbackDataset that has been pushed to the Hub back in Argilla using the from_huggingface method.

# Load a public dataset
dataset = rg.FeedbackDataset.from_huggingface("argilla/my-dataset")

# Load a private dataset
dataset = rg.FeedbackDataset.from_huggingface("argilla/my-dataset", use_auth_token=True)

Note

The args and kwargs of push_to_huggingface are the args of push_to_hub from ๐Ÿค—Datasets, and the ones of from_huggingface are the args of load_dataset from ๐Ÿค—Datasets.

Save to disk#

Additionally, due to the integration with ๐Ÿค— Datasets, you can also export the records of a FeedbackDataset locally in your preferred format by converting the FeedbackDataset to a datasets.Dataset first using the method format_as("datasets"). Then, you may export the datasets.Dataset to either CSV, JSON, Parquet, etc. Check all the options in the ๐Ÿค— Datasets documentation.

hf_dataset = dataset.format_as("datasets")

hf_dataset.save_to_disk("sharegpt-prompt-rating-mini")  # Save as a `datasets.Dataset` in the local filesystem
hf_dataset.to_csv("sharegpt-prompt-rating-mini.csv")  # Save as CSV
hf_dataset.to_json("sharegpt-prompt-rating-mini.json")  # Save as JSON
hf_dataset.to_parquet()  # Save as Parquet

Note

This workaround will just export the records into the desired format, not the dataset configuration. If you want to load the records back into Argilla, you will need to create a FeedbackDataset and add the records as explained here.