ConnectOnionConnectOnion

eval

Debug and test your agent prompts and tools

What it does

The eval plugin helps you debug agents during development:

Generate Expected (after_user_input)

Generates what should happen to complete the task (unless already set by re_act plugin).

Evaluate (on_complete)

After agent finishes, evaluates if the task was truly completed.

Quick Start

main.py
1from connectonion import Agent 2from connectonion.useful_plugins import eval 3 4def calculate(expression: str) -> str: 5 """Calculate a math expression.""" 6 return str(eval(expression)) 7 8agent = Agent("assistant", tools=[calculate], plugins=[eval]) 9 10agent.input("What is 25 * 4?")
Python REPL
Interactive
[Expected: Should calculate 25 * 4 and return 100]
[Tool: calculate("25 * 4")]
Result: 100
/evaluating...
✓ Task complete: Calculated 25 * 4 = 100, which matches the expected result.

Combined with re_act

When used with re_act, the eval plugin skips generating expected outcomes (re_act's plan serves as the expected):

main.py
1from connectonion import Agent 2from connectonion.useful_plugins import re_act, eval 3 4agent = Agent("assistant", tools=[search], plugins=[re_act, eval]) 5 6agent.input("Search for Python tutorials") 7# re_act: Plans the task (sets 'expected' in session) 8# Tools execute with reflection 9# eval: Evaluates completion (uses re_act's plan as expected)

How it works

1. Generate Expected

main.py
1@after_user_input 2def generate_expected(agent): 3 # Skip if already set by another plugin (e.g., re_act) 4 if agent.current_session.get('expected'): 5 return 6 7 user_prompt = agent.current_session.get('user_prompt', '') 8 tool_names = agent.tools.names() 9 10 expected = llm_do( 11 f"User request: {user_prompt}\nTools: {tool_names}\nWhat should happen?", 12 model="co/gemini-2.5-flash" 13 ) 14 15 agent.current_session['expected'] = expected

2. Evaluate Completion

main.py
1@on_complete 2def evaluate_completion(agent): 3 user_prompt = agent.current_session.get('user_prompt', '') 4 result = agent.current_session.get('result', '') 5 expected = agent.current_session.get('expected', '') 6 trace = agent.current_session.get('trace', []) 7 8 # Summarize actions taken 9 actions = [f"- {t['tool_name']}: {t['result'][:100]}" 10 for t in trace if t['type'] == 'tool_execution'] 11 12 evaluation = llm_do( 13 f"Request: {user_prompt}\nExpected: {expected}\n" 14 f"Actions: {actions}\nResult: {result}\n" 15 f"Is this complete?", 16 model="co/gemini-2.5-flash" 17 ) 18 19 agent.current_session['evaluation'] = evaluation 20 agent.logger.print(f"✓ {evaluation}")

Events Used

EventHandlerPurpose
after_user_inputgenerate_expectedSet expected outcome
on_completeevaluate_completionEvaluate if task complete

Use Cases

  • Development: Verify your agent completes tasks correctly
  • Testing: Automated evaluation of agent responses
  • Debugging: Identify incomplete or incorrect tool usage

Source

connectonion/useful_plugins/eval.py

main.py
1# The plugin is just a list of event handlers 2eval = [generate_expected, evaluate_completion]