langchain: Monitoring LLMs in apps, chains, and agents and tools#

This guide explains how to use the ArgillaCallbackHandler to integrate Argilla with LangChain apps. With this integration, Argilla can be used to evaluate and fine-tune LLMs. It works by collecting the interactions with LLMs and pushing them into a FeedbackDataset for continuous monitoring and human feedback. You just need to create a Langchain-compatible FeedbackDataset in Argilla and then instantiate the ArgillaCallbackHandler to be provided to LangChain LLMs, Chains, and/or Agents.

Warning

As of Argilla 1.14.0 the FeedbackDataset has been refactored to improve its usage, so if youโ€™re using Argilla 1.14.0 or higher, you wonโ€™t be able to use the ArgillaCallbackHandler as itโ€™s not been updated in LangChain yet.

LangChain-compatible FeedbackDataset#

Due to the way LangChain callbacks and FeedbackDatasets work, we need to create a FeedbackDataset in Argilla with a certain structure for the fields, while the questions and the guidelines remain open and can be defined by the user.

The FeedbackDataset needs to have the following fields: prompt and response; the prompt field is the one that will be used to provide the input to the LLMs, while the response field is the one that will be used to collect the output of the LLMs.

Then, regarding the questions and the guidelines, the user is free to define them as they wish, as they will not be used by the ArgillaCallbackHandler to collect the data generated by the LLMs, but they will be used to annotate the FeedbackDataset.

Hereโ€™s an example of how to create a FeedbackDataset in Argilla that can be used with ArgillaCallbackHandler:

import argilla as rg

rg.init(
    api_url="...",
    api_key="..."
)

dataset = rg.FeedbackDataset(
    fields=[
        rg.TextField(name="prompt", required=True),
        rg.TextField(name="response", required=True)
    ],
    questions=[
        rg.RatingQuestion(
            name="response-rating",
            description="How would you rate the quality of the response?",
            values=[1, 2, 3, 4, 5],
            required=True,
        ),
        rg.TextQuestion(
            name="response-correction",
            description="If you think the response is not accurate, please, correct it.",
            required=False,
        ),
    ],
    guidelines="Please, read the questions carefully and try to answer it as accurately as possible.",
)

Then youโ€™ll need to push that FeedbackDataset to Argilla as follows, otherwise, the ArgillaCallbackHandler wonโ€™t work.

dataset.push_to_argilla("langchain-dataset")

For more information on how to create a FeedbackDataset, please refer to the Create a Feedback Dataset guide.

Monitoring#

All the LangChain callbacks are instantiated and provided to the LangChain LLMs, Chains, and/or Agents, and then thereโ€™s no need to worry about them anymore, as those will automatically keep track of everything taking place in the LangChain pipeline. In this case, weโ€™re keeping track of both the input and the final response provided by the LLMs, Chains, and/or Agents.

from langchain.callbacks import ArgillaCallbackHandler

argilla_callback = ArgillaCallbackHandler(
    dataset_name="langchain-dataset",
    api_url="...",
    api_key="...",
)

An LLM#

First, letโ€™s just run a single LLM a few times and capture the resulting prompt-response pairs in Argilla.

from langchain.callbacks import ArgillaCallbackHandler, StdOutCallbackHandler
from langchain.llms import OpenAI

argilla_callback = ArgillaCallbackHandler(
    dataset_name="langchain-dataset",
    api_url="...",
    api_key="...",
)

llm = OpenAI(temperature=0.9, callbacks=[argilla_callback])
llm.generate(["Tell me a joke", "Tell me a poem"] * 3)

Argilla UI with LangChain LLM input-response

An LLM in a chain#

Then we can create a chain using a prompt template, and then track the initial prompt and the final response in Argilla.

from langchain.callbacks import ArgillaCallbackHandler, StdOutCallbackHandler
from langchain.llms import OpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate

argilla_callback = ArgillaCallbackHandler(
    dataset_name="langchain-dataset",
    api_url="...",
    api_key="...",
)

llm = OpenAI(temperature=0.9, callbacks=[argilla_callback])

template = """You are a playwright. Given the title of play, it is your job to write a synopsis for that title.
Title: {title}
Playwright: This is a synopsis for the above play:"""
prompt_template = PromptTemplate(input_variables=["title"], template=template)
synopsis_chain = LLMChain(llm=llm, prompt=prompt_template, callbacks=[argilla_callback])

test_prompts = [{"title": "Documentary about Bigfoot in Paris"}]
synopsis_chain.apply(test_prompts)

Argilla UI with LangChain Chain input-response

Agents with Tools#

Finally, as a more advanced workflow, you can create an agent that uses some tools. So that ArgillaCallbackHandler will keep track of the input and the output, but not about the intermediate steps/thoughts, so that given a prompt we log the original prompt and the final response to that given prompt.

Note that for this scenario weโ€™ll be using Google Search API (Serp API) so you will need to both install google-search-results as pip install google-search-results, and to set the Serp API Key as os.environ["SERPAPI_API_KEY"] = "..." (you can find it at https://serpapi.com/dashboard), otherwise the example below wonโ€™t work.

from langchain.agents import AgentType, initialize_agent, load_tools
from langchain.callbacks import ArgillaCallbackHandler, StdOutCallbackHandler
from langchain.llms import OpenAI

argilla_callback = ArgillaCallbackHandler(
    dataset_name="langchain-dataset",
    api_url="...",
    api_key="...",
)

llm = OpenAI(temperature=0.9, callbacks=[argilla_callback])
tools = load_tools(["serpapi"], llm=llm, callbacks=[argilla_callback])
agent = initialize_agent(
    tools,
    llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    callbacks=[argilla_callback],
)
agent.run("Who was the first president of the United States of America?")

Argilla UI with LangChain Agent input-response

Synthetic data#

If you want to create synthetic data with LangChain, you can use the ArgillaCallbackHandler to keep track of the input and the output of the LLMs, Chains, and/or Agents, and then store that data in Argilla. This means you would monitor the data in a similar scenario as described above, but instead of providing a direct functional prompt tailored to data generation in order to set up your LLMs to come up with some synthetic data for a `TextField. If you want a more tailored approach to data generation and computational feedback, you can take a look at this integration with LangChain or this tutorial on SetFit for suggestions.

Warning

Do keep in mind that LLMs have licenses and not every LLM can be used for creating synthetic data in every operational setting. Please check the license of the LLM you are using before using it for creating synthetic data.

import random

from langchain.callbacks import ArgillaCallbackHandler, StdOutCallbackHandler
from langchain.llms import OpenAI

argilla_callback = ArgillaCallbackHandler(
    dataset_name="langchain-dataset",
    api_url="...",
    api_key="...",
)

topics = ["opening a new account", "applying for a loan", "applying for a credit card"]
sentiment = ["positive", "neutral", "negative"]

def get_prompt():
    prompt = (
        "Write a customer review for a bank. "
        f"Do that for topic of {random.choice(topics)}. "
        f"Do that with one a {random.choice(sentiment)} sentiment."
    )
    return template

llm = OpenAI(temperature=0.9, callbacks=[argilla_callback])
llm.generate([get_prompt() for _ in range(3)])