Skip to content

Why ECP?

ECP is not trying to replace eval platforms. It standardizes the contract between agents and evaluators so teams can run evals locally, in CI, or inside whichever platform they already use.

The Gap

Agent teams often need to answer questions that final-output checks miss:

  • Did the agent call the required tool?
  • Did it use the right arguments?
  • Did it follow policy before responding?
  • Can we reproduce this in CI?
  • Can we move eval results between tools without rewriting the agent?

Tracing and hosted eval platforms help, but the data model is usually tied to that platform. ECP makes the agent evaluation surface portable.

ECP Compared

Tooling Category What It Is Good At Where ECP Fits
Unit tests deterministic code checks ECP adds agent/tool/evaluation surfaces
LLM judges semantic output grading ECP makes judge inputs and results repeatable
Trace platforms observability and debugging ECP provides a small portable eval contract
Eval platforms datasets, dashboards, experiments ECP can feed or interoperate with platforms

The MCP Analogy

MCP standardizes how agents connect to tools.

ECP standardizes how agents expose evaluation results:

  • public_output
  • evaluation_context
  • tool_calls
  • manifest scenarios
  • grader results
  • portable reports

The Enterprise Angle

Enterprise teams care about auditability, regression testing, policy compliance, data boundaries, vendor flexibility, and CI workflows. ECP is designed to be boring infrastructure: a small contract that can sit under many tools instead of forcing every team into one hosted workflow.