Evaluation Context Protocol (ECP)

View on GitHub | Quickstart | Specification

Portable evaluations for AI agents.

ECP is a vendor-neutral protocol for testing agent outputs, tool calls, and evaluator-visible audit context across frameworks, models, eval platforms, and CI systems.

Think of it as the evaluation contract layer: MCP gives agents a common way to use tools; ECP gives evaluators a common way to inspect what an agent returned, what tools it used, and what audit evidence it exposed.

Why ECP Exists

Agent evaluations are still too tied to individual frameworks, tracing tools, and hosted platforms. Those tools are useful, but the evaluation contract itself should be portable.

ECP separates the protocol from the platform:

run evals locally or in CI
wrap agents built with plain Python, LangChain, LlamaIndex, CrewAI, or PydanticAI
grade final outputs, tool calls, and evaluation_context
emit JSON and HTML reports that other systems can ingest
implement the same JSON-RPC contract in another language or runtime

What ECP Checks

Most evals start with the final answer. ECP also checks the behavior behind that answer.

Evaluation Need	ECP Surface
Did the user-visible answer satisfy the task?	`public_output`
Did the agent call the required tool with the right arguments?	`tool_calls`
Did the agent expose evaluator-safe audit evidence?	`evaluation_context`
Can this run in CI and fail a build?	`ecp run --manifest ...`

private_thought is still accepted as a compatibility alias, but new agents should use evaluation_context. ECP is not asking providers to expose raw chain-of-thought; it is asking agents to expose evaluator-safe evidence.

Developer Path

pip install "ecp-runtime==0.3.1" "ecp-sdk==0.3.1"
ecp init
ecp validate ecp_eval/manifest.yaml
ecp run --manifest ecp_eval/manifest.yaml --json

For a realistic example that shows why output-only evals are not enough:

ecp run --manifest examples/customer_support_demo/manifest.yaml --report report.html

What Is In This Repo

sdk/python/src/ecp - Python SDK for implementing ECP agents
runtime/python/src/ecp_runtime - reference runtime and ecp CLI
examples/customer_support_demo - flagship policy/tool-use demo
examples/*_demo - framework integrations
schema/ - JSON Schema contracts for manifests, agent results, tool calls, and reports
spec/ and docs/spec.md - protocol specification

Go to Quickstart to run your first eval, Examples for integration patterns, or Specification to implement ECP in another runtime.