PromptOps + ReleaseOps Demo

Two prompt versions. Same customer requests. Different agent behavior. Watch every change get versioned, bundled, and traced.

💬

The Scenario

A customer support agent handles 5 refund requests. One prompt change shifts the threshold.

Customer "I'd like a refund for my $120 purchase"
v1.0.0
Escalate to supervisor
Rule: escalate refunds over $50
vs
v1.1.0
Auto-approve refund
Rule: auto-approve up to $200
v1.0.0 Escalate > $50 v1.1.0 Auto-approve up to $200
1.0x
🔍

The Key Moment: $120 Refund

The medium refund scenario ($120 purchase) is where behavior diverges. v1.0.0 escalates it to a supervisor (over the $50 limit), while v1.1.0 auto-approves it (under the $200 limit).

Attribution traces this decision back to the exact line in the system prompt where the threshold changed: $50 → $200.

What the Demo Shows

Act 1 — Version & Bundle

Git-Native Prompt Versioning

Prompts are versioned via git tags. Policies (tools/context/safety) and model config are pinned alongside prompts into an immutable, SHA-256 content-addressed artifact — every bundle is reproducible and verifiable.

Act 2 — Promote

Eval-Gated Promotion & Rollback

Promotion to prod is blocked until a passing eval report exists. Bundles move dev → staging → prod with configurable quality gates. Rollback restores the previous version in one command with a full audit trail.

Act 3 — Attribution

Behavior Attribution

Attribution scores each influence 0.0–1.0 and labels it HIGH / MEDIUM / LOW. Verdict: Expected, Unexpected, or Contradicts artifacts — traced to the exact prompt line that caused the behavior.

Act 4 — Analytics

Behavioral Analytics

The real story isn't in latency — it's in what the agent does. Approval rate goes from 20% to 40%, escalation drops from 60% to 40%, while latency and tokens stay flat. Analytics surfaces that behavioral shift so you can decide whether it's the outcome you wanted.

Works With Your Stack

Framework-agnostic by design. No LangChain, LangGraph, or framework-specific hooks required. Load a bundle with two lines and use it with any Python AI client.

OpenAI SDK Anthropic SDK LangChain LangGraph LlamaIndex CrewAI Any Python framework
runtime_loader.py
# Same two lines regardless of your AI framework
from llmhq_releaseops.runtime import RuntimeLoader

bundle, metadata = RuntimeLoader().load_bundle("support-agent@prod")
# bundle.prompts, bundle.policies, bundle.model_config — all resolved
# metadata auto-injected into OpenTelemetry spans

# OpenAI
client.chat.completions.create(
    model=bundle.model_config["model"],
    messages=[{"role": "system", "content": bundle.prompts["system"]}]
)

# Anthropic
client.messages.create(
    model=bundle.model_config["model"],
    system=bundle.prompts["system"]
)

# LangChain / LlamaIndex / CrewAI — same bundle, different client

Built for Production

Framework agnostic
No LangChain or LangGraph dependency. Works with any Python agent or LLM client.
📄
Git-native storage
All state in YAML files tracked by git. No database, no external service required.
🔒
Content-addressed
SHA-256 bundles are immutable. Identical inputs always produce the same hash.
📈
OpenTelemetry ready
Bundle metadata auto-injected into OTel spans. Trace any action to an exact artifact version.
Eval-gated promotion
Promotion blocked until a passing eval report exists. Pluggable judges: exact match, regex, LLM-as-judge.
Instant rollback
One command restores the previous prod version. Full audit trail of every promotion decision.

Try It Yourself

The demo runs locally with no API keys. Install both packages and run the demo script.

pip install llmhq-promptops llmhq-releaseops
git clone https://github.com/llmhq-hub/releaseops
cd releaseops/examples
python demo.py