Quality Gates That Block Bad Releases
A fintech loan agent auto-approves a $500K mortgage that should go to underwriting. The eval gate catches the compliance violation before it reaches staging.
The Scenario
A fintech company uses an agent to pre-screen loan applications. Regulatory compliance requires certain rules are always enforced.
The Key Moment: $500K Mortgage Approved
v1.1.0 auto-approved a $500K mortgage instead of escalating it to an underwriter. In fintech, this is a compliance violation — mortgages above $100K require human review.
The eval gate caught it: score dropped from 4/4 (100%) to
2/4 (50%). Promotion to staging was blocked.
What the Demo Shows
Eval Suites
Define test cases with inputs, expected behaviors, and assertions. Each case targets a specific scenario: approvals, escalations, rejections, and edge cases.
Pluggable Judges
Assertions use pluggable judges: exact match, contains, regex, LLM-as-judge, or composites. v1.0.0 passes all 4 cases with deterministic judges.
Gate Blocking
v1.1.0 fails 2 of 4 cases. The eval gate blocks promotion —
releaseops promote returns an error with the failing
assertions and expected-vs-actual output.
Eval Reports
Full eval reports in markdown or JSON format. Case-by-case results, score summaries, and gate decisions. Fix the prompt, re-run the eval, and promote once it passes.
Try It Yourself
Define eval suites and run them locally. No API keys needed for deterministic judges (exact match, contains, regex).
pip install llmhq-promptops llmhq-releaseops
releaseops eval create my-suite
releaseops eval run my-suite -b my-agent -v 1.0.0