๐จ๐ฝโ๐ป Deploying#
Argilla currently gives users several ways to log model predictions.
This brief guide introduces the different methods and expected usages.
Using rg.monitor
#
For widely-used libraries Argilla includes an โauto-monitoringโ option via the rg.monitor
method. Currently supported libraries are Hugging Face Transformers and spaCy, if youโd like to see another library supported feel free to add a discussion or issue on GitHub.
rg.monitor
will wrap HF and spaCy pipelines so every time you call them, the output of these calls will be logged into the dataset of your choice, as a background process, in a non-blocking way. Additionally, rg.monitor
will add several tags to your dataset such as the library build version, the model name, the language, etc. This should also work for custom (private) pipelines, not only the Hubโs or official spaCy models.
It is worth noting that this feature is useful beyond monitoring, and can be used for data collection (e.g., bootstrapping data annotation with pre-trained pipelines), model development (e.g., error analysis), and model evaluation (e.g., combined with data annotation to obtain evaluation metrics).
Letโs see it in action using the IMDB dataset:
[ ]:
from datasets import load_dataset
dataset = load_dataset("imdb", split="test[0:1000]")
Hugging Face Transformer Pipelines#
Argilla currently supports monitoring text-classification
and zero-shot-classification
pipelines, but token-classification
and text-generation
pipelines will be added in coming releases.
[ ]:
from transformers import pipeline
import argilla as rg
nlp = pipeline(
"sentiment-analysis", return_all_scores=True, padding=True, truncation=True
)
nlp = rg.monitor(nlp, dataset="nlp_monitoring")
dataset.map(lambda example: {"prediction": nlp(example["text"])})
Once the map
operation starts, you can start browsing the predictions in the Web-app. The default Argilla installation comes with Elasticโs Kibana pre-configured, so you can easily build custom monitoring dashboards and alerts (for your team and other stakeholders).
Record-level metadata is a key element of Argilla datasets, enabling users to do fine-grained analysis and dataset slicing. Letโs see how we can log metadata while using rg.monitor
. Letโs use the label in ag_news to add a news_category field for each record.
[ ]:
dataset
[ ]:
dataset.map(
lambda example: {
"prediction": nlp(example["text"], metadata={"news_category": example["label"]})
}
)
spaCy#
Argilla currently supports monitoring the NER pipeline component, but textcat
will be added soon.
[ ]:
import spacy
import argilla as rg
nlp = spacy.load("en_core_web_sm")
nlp = rg.monitor(nlp, dataset="nlp_monitoring_spacy")
dataset.map(lambda example: {"prediction": nlp(example["text"])})
Once the map
operation starts, you can start browsing the predictions in the Web-app:
Flair#
Argilla currently supports monitoring Flair NER pipelines component.
[ ]:
import argilla as rg
from flair.data import Sentence
from flair.models import SequenceTagger
# load tagger
tagger = rg.monitor(
SequenceTagger.load("flair/ner-english"), dataset="flair-example", sample_rate=1.0
)
# make example sentence
sentence = Sentence("George Washington went to Washington")
# predict NER tags. This will log the prediction in Argilla
tagger.predict(sentence)
The following logs the predictions over the IMDB dataset:
[ ]:
def make_prediction(example):
tagger.predict(Sentence(example["text"]))
return {"prediction": True}
dataset.map(make_prediction)
Using async rg.log
#
You can monitor your own models without adding a response delay by using the background
param in rg.log()
.
Letโs see an example using BentoML with a spaCy NER pipeline:
[ ]:
import spacy
nlp = spacy.load("en_core_web_sm")
[ ]:
%%writefile spacy_model.py
from bentoml import BentoService, api, artifacts, env
from bentoml.adapters import JsonInput
from bentoml.frameworks.spacy import SpacyModelArtifact
import argilla as rg
@env(infer_pip_packages=True)
@artifacts([SpacyModelArtifact("nlp")])
class SpacyNERService(BentoService):
@api(input=JsonInput(), batch=True)
def predict(self, parsed_json_list):
result, rb_records = ([], [])
for index, parsed_json in enumerate(parsed_json_list):
doc = self.artifacts.nlp(parsed_json["text"])
prediction = [{"entity": ent.text, "label": ent.label_} for ent in doc.ents]
rb_records.append(
rg.TokenClassificationRecord(
text=doc.text,
tokens=[t.text for t in doc],
prediction=[
(ent.label_, ent.start_char, ent.end_char) for ent in doc.ents
],
)
)
result.append(prediction)
rg.log(
name="monitor-for-spacy-ner",
records=rb_records,
tags={"framework": "bentoml"},
background=True,
verbose=False
) # By using the background=True, the model latency won't be affected
return result
[ ]:
from spacy_model import SpacyNERService
svc = SpacyNERService()
svc.pack("nlp", nlp)
saved_path = svc.save()
You can predict some data without serving the model. Just launch following command:
[ ]:
!bentoml run SpacyNERService:latest predict --input "{\"text\":\"I am driving BMW\"}"
If youโre running Argilla in local, go to http://localhost:6900/datasets/argilla/monitor-for-spacy-ner and see that the new dataset monitor-for-spacy-ner
contains your data.
Using ASGI middleware#
For using the ASGI middleware, see this tutorial.