Welcome to Argilla¶
Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets.
To get started:
-
Get started in 5 minutes!
Deploy Argilla for free on the Hugging Face Hub or with
Docker
. Install the Python SDK withpip
and create your first project. -
How-to guides
Get familiar with the basic workflows of Argilla. Learn how to manage
Users
,Workspaces
,Datasets
, andRecords
to set up your data annotation projects.
Or, play with the Argilla UI by signing in with your Hugging Face account:
Looking for Argilla 1.x?
Looking for documentation for Argilla 1.x? Visit the latest release.
Migrate to Argilla 2.x
Want to learn how to migrate from Argilla 1.x to 2.x? Take a look at our dedicated Migration Guide.
Why use Argilla?¶
Argilla can be used for collecting human feedback for a wide variety of AI projects like traditional NLP (text classification, NER, etc.), LLMs (RAG, preference tuning, etc.), or multimodal models (text to image, etc.).
Argilla's programmatic approach lets you build workflows for continuous evaluation and model improvement. The goal of Argilla is to ensure your data work pays off by quickly iterating on the right data and models.
Improve your AI output quality through data quality
Compute is expensive and output quality is important. We help you focus on data, which tackles the root cause of both of these problems at once. Argilla helps you to achieve and keep high-quality standards for your data. This means you can improve the quality of your AI outputs.
Take control of your data and models
Most AI tools are black boxes. Argilla is different. We believe that you should be the owner of both your data and your models. That's why we provide you with all the tools your team needs to manage your data and models in a way that suits you best.
Improve efficiency by quickly iterating on the right data and models
Gathering data is a time-consuming process. Argilla helps by providing a tool that allows you to interact with your data in a more engaging way. This means you can quickly and easily label your data with filters, AI feedback suggestions and semantic search. So you can focus on training your models and monitoring their performance.
What do people build with Argilla?¶
Datasets and models
Argilla is a tool that can be used to achieve and keep high-quality data standards with a focus on NLP and LLMs. The community uses Argilla to create amazing open-source datasets and models, and we love contributions to open-source too.
- cleaned UltraFeedback dataset and the Notus and Notux models, where we improved benchmark and empirical human judgment for the Mistral and Mixtral models with cleaner data using human feedback.
- distilabeled Intel Orca DPO dataset and the improved OpenHermes model, show how we improve model performance by filtering out 50% of the original dataset through human and AI feedback.
Projects and pipelines
AI teams from companies like the Red Cross, Loris.ai and Prolific use Argilla to improve the quality and efficiency of AI projects. They shared their experiences in the AI community meetup.
- AI for good: the Red Cross presentation showcases how their experts and AI team collaborate by classifying and redirecting requests from refugees of the Ukrainian crisis to streamline the support processes of the Red Cross.
- Customer support: during the Loris meetup they showed how their AI team uses unsupervised and few-shot contrastive learning to help them quickly validate and gain labeled samples for a huge amount of multi-label classifiers.
- Research studies: the showcase from Prolific announced their integration with Argilla. They use it to actively distribute data collection projects among their annotating workforce. This allows them to quickly and efficiently collect high-quality data for their research studies.