Overview
agiflow_eval
is a customizable evaluation library designed to measure various metrics for language model outputs. It provides tools to evaluate aspects such as answer relevancy, hallucination, bias, faithfulness, contextual relevancy, and toxicity. The library is ported from the awesome DeepEval (opens in a new tab) to support custom evaluation templates and LLM models.
Installation
First, ensure you have the necessary packages installed. You can install the required dependencies using pip
:
pip install agiflow-eval
Usage
To use the metrics, first initialize the model and aggregator:
from agiflow_eval import (
EvalLiteLLM,
MetadataAggregator,
)
metadata = MetadataAggregator()
model = EvalLiteLLM()
Then create the test case and measure the metric as follows:
from agiflow_eval import ToxicityMetric, LLMTestCase
metric = ToxicityMetric(metadata=metadata, model=model)
test_case = LLMTestCase(input="input text", actual_output="actual output text")
score = await metric.a_measure(test_case)
Custom Template
You can simply extends the Default Metric Template class and pass it to Metric class as follow:
from agiflow_eval import ToxicityMetric, ToxicityTemplate, LLMTestCase
class YourTemplate(ToxicityTemplate):
...
metric = ToxicityMetric(metadata=metadata, model=model, template=YourTemplate())