Agiflow-eval Overview

Overview

agiflow_eval is a customizable evaluation library designed to measure various metrics for language model outputs. It provides tools to evaluate aspects such as answer relevancy, hallucination, bias, faithfulness, contextual relevancy, and toxicity. The library is ported from the awesome DeepEval (opens in a new tab) to support custom evaluation templates and LLM models.

Installation

First, ensure you have the necessary packages installed. You can install the required dependencies using pip:

pip install agiflow-eval

Usage

To use the metrics, first initialize the model and aggregator:

from agiflow_eval import (
  EvalLiteLLM,
  MetadataAggregator,
)
 
metadata = MetadataAggregator()
model = EvalLiteLLM()

Then create the test case and measure the metric as follows:

from agiflow_eval import ToxicityMetric, LLMTestCase
 
metric = ToxicityMetric(metadata=metadata, model=model)
test_case = LLMTestCase(input="input text", actual_output="actual output text")
score = await metric.a_measure(test_case)

Custom Template

You can simply extends the Default Metric Template class and pass it to Metric class as follow:

from agiflow_eval import ToxicityMetric, ToxicityTemplate, LLMTestCase
 
class YourTemplate(ToxicityTemplate):
...
 
metric = ToxicityMetric(metadata=metadata, model=model, template=YourTemplate())

Customization Evaluations