What Changed Between Versions?
An IT helpdesk agent handles tickets two ways: conservative (route to humans) vs autonomous (self-resolve). Analytics quantifies the behavioral shift.
The Scenario
An IT helpdesk uses an agent to triage and route support tickets. v1.0.0 is conservative, v2.0.0 handles more tickets autonomously.
The Key Moment: The Trade-Off
v2.0.0 auto-resolves 3x more tickets (20% → 60%) but uses 56% more tokens and is 76% slower.
Analytics surfaces both sides: the improvement (fewer tickets to human staff)
and the cost (higher latency and token usage). The assessment is
MIXED — you decide if the trade-off is acceptable.
What the Demo Shows
Metrics Aggregation
Aggregate traces into behavioral metrics: latency percentiles, token usage, error rates, and action distributions. See what your agent actually does, not just how fast it runs.
Action Distribution
The real story is in what actions the agent takes. How many tickets get auto-resolved vs routed to humans vs escalated? Action distribution shows the practical impact.
Version Comparison
Compare any two versions with significance levels: MAJOR (> 25%), moderate (> 10%), minor (> 5%), negligible. Overall assessment: improvement, regression, mixed, or neutral.
Significance Levels
Not all changes matter equally. A 2% latency increase is negligible. A 200% increase in auto-resolution is major. Significance levels help you focus on what actually matters.
Try It Yourself
Analytics runs locally with MockPlatform for testing. Connect to LangSmith for production trace data.
pip install llmhq-promptops llmhq-releaseops
releaseops analytics metrics my-agent@1.0.0
releaseops analytics compare \
my-agent@1.0.0 my-agent@2.0.0