eval

Debug and test your agent prompts and tools

Download

What it does

The eval plugin helps you debug agents during development:

Generate Expected (after_user_input)

Generates what should happen to complete the task (unless already set by re_act plugin).

Evaluate (on_complete)

After agent finishes, evaluates if the task was truly completed.

Quick Start

main.py

from connectonion import Agent
from connectonion.useful_plugins import eval

def calculate(expression: str) -> str:
    """Calculate a math expression."""
    return str(eval(expression))

agent = Agent("assistant", tools=[calculate], plugins=[eval])

agent.input("What is 25 * 4?")

Python REPL

Interactive

[Expected: Should calculate 25 * 4 and return 100]

[Tool: calculate("25 * 4")]

Result: 100

/evaluating...

✓ Task complete: Calculated 25 * 4 = 100, which matches the expected result.

Want to customize? Run co copy eval to get an editable copy.

Combined with re_act

When used with re_act, the eval plugin skips generating expected outcomes (re_act's plan serves as the expected):

main.py

from connectonion import Agent
from connectonion.useful_plugins import re_act, eval

agent = Agent("assistant", tools=[search], plugins=[re_act, eval])

agent.input("Search for Python tutorials")
# re_act: Plans the task (sets 'expected' in session)
# Tools execute with reflection
# eval: Evaluates completion (uses re_act's plan as expected)

How it works

1. Generate Expected

main.py

@after_user_input
def generate_expected(agent):
    # Skip if already set by another plugin (e.g., re_act)
    if agent.current_session.get('expected'):
        return

    user_prompt = agent.current_session.get('user_prompt', '')
    tool_names = agent.tools.names()

    expected = llm_do(
        f"User request: {user_prompt}\nTools: {tool_names}\nWhat should happen?",
        model="co/gemini-2.5-flash"
    )

    agent.current_session['expected'] = expected

2. Evaluate Completion

main.py

@on_complete
def evaluate_completion(agent):
    user_prompt = agent.current_session.get('user_prompt', '')
    result = agent.current_session.get('result', '')
    expected = agent.current_session.get('expected', '')
    trace = agent.current_session.get('trace', [])

    # Summarize actions taken
    actions = [f"- {t['tool_name']}: {t['result'][:100]}"
               for t in trace if t['type'] == 'tool_execution']

    evaluation = llm_do(
        f"Request: {user_prompt}\nExpected: {expected}\n"
        f"Actions: {actions}\nResult: {result}\n"
        f"Is this complete?",
        model="co/gemini-2.5-flash"
    )

    agent.current_session['evaluation'] = evaluation
    agent.logger.print(f"✓ {evaluation}")

Events Used

Event	Handler	Purpose
`after_user_input`	generate_expected	Set expected outcome
`on_complete`	evaluate_completion	Evaluate if task complete

Use Cases

Development: Verify your agent completes tasks correctly
Testing: Automated evaluation of agent responses
Debugging: Identify incomplete or incorrect tool usage

Source

connectonion/useful_plugins/eval.py

main.py

# The plugin is just a list of event handlers
eval = [generate_expected, evaluate_completion]

re_act

image_result_formatter