Quality Gates That Block Bad Releases

A fintech loan agent auto-approves a $500K mortgage that should go to underwriting. The eval gate catches the compliance violation before it reaches staging.

🏦

The Scenario

A fintech company uses an agent to pre-screen loan applications. Regulatory compliance requires certain rules are always enforced.

Application "$500K mortgage application, credit score 780"
v1.0.0
Escalate to underwriter
Rule: escalate mortgages over $100K
vs
v1.1.0
Auto-approve
Rule: auto-approve all under $100K (BUG)
v1.0.0 Escalate mortgages > $100K v1.1.0 Auto-approve < $100K (buggy)
1.0x
🚨

The Key Moment: $500K Mortgage Approved

v1.1.0 auto-approved a $500K mortgage instead of escalating it to an underwriter. In fintech, this is a compliance violation — mortgages above $100K require human review.

The eval gate caught it: score dropped from 4/4 (100%) to 2/4 (50%). Promotion to staging was blocked.

What the Demo Shows

Act 1 — Define

Eval Suites

Define test cases with inputs, expected behaviors, and assertions. Each case targets a specific scenario: approvals, escalations, rejections, and edge cases.

Act 2 — Pass

Pluggable Judges

Assertions use pluggable judges: exact match, contains, regex, LLM-as-judge, or composites. v1.0.0 passes all 4 cases with deterministic judges.

Act 3 — Fail

Gate Blocking

v1.1.0 fails 2 of 4 cases. The eval gate blocks promotion — releaseops promote returns an error with the failing assertions and expected-vs-actual output.

Act 4 — Fix

Eval Reports

Full eval reports in markdown or JSON format. Case-by-case results, score summaries, and gate decisions. Fix the prompt, re-run the eval, and promote once it passes.

Try It Yourself

Define eval suites and run them locally. No API keys needed for deterministic judges (exact match, contains, regex).

pip install llmhq-promptops llmhq-releaseops
releaseops eval create my-suite
releaseops eval run my-suite -b my-agent -v 1.0.0