PromptOps + ReleaseOps Demo
Two prompt versions. Same customer requests. Different agent behavior. Watch every change get versioned, bundled, and traced.
The Scenario
A customer support agent handles 5 refund requests. One prompt change shifts the threshold.
The Key Moment: $120 Refund
The medium refund scenario ($120 purchase) is where behavior diverges. v1.0.0 escalates it to a supervisor (over the $50 limit), while v1.1.0 auto-approves it (under the $200 limit).
Attribution traces this decision back to the exact line in the system prompt
where the threshold changed: $50 → $200.
What the Demo Shows
Git-Native Prompt Versioning
Prompts are versioned via git tags. Policies (tools/context/safety) and model config are pinned alongside prompts into an immutable, SHA-256 content-addressed artifact — every bundle is reproducible and verifiable.
Eval-Gated Promotion & Rollback
Promotion to prod is blocked until a passing eval report exists. Bundles move dev → staging → prod with configurable quality gates. Rollback restores the previous version in one command with a full audit trail.
Behavior Attribution
Attribution scores each influence 0.0–1.0 and labels it HIGH / MEDIUM / LOW. Verdict: Expected, Unexpected, or Contradicts artifacts — traced to the exact prompt line that caused the behavior.
Behavioral Analytics
The real story isn't in latency — it's in what the agent does. Approval rate goes from 20% to 40%, escalation drops from 60% to 40%, while latency and tokens stay flat. Analytics surfaces that behavioral shift so you can decide whether it's the outcome you wanted.
Works With Your Stack
Framework-agnostic by design. No LangChain, LangGraph, or framework-specific hooks required. Load a bundle with two lines and use it with any Python AI client.
# Same two lines regardless of your AI framework
from llmhq_releaseops.runtime import RuntimeLoader
bundle, metadata = RuntimeLoader().load_bundle("support-agent@prod")
# bundle.prompts, bundle.policies, bundle.model_config — all resolved
# metadata auto-injected into OpenTelemetry spans
# OpenAI
client.chat.completions.create(
model=bundle.model_config["model"],
messages=[{"role": "system", "content": bundle.prompts["system"]}]
)
# Anthropic
client.messages.create(
model=bundle.model_config["model"],
system=bundle.prompts["system"]
)
# LangChain / LlamaIndex / CrewAI — same bundle, different client
Built for Production
Try It Yourself
The demo runs locally with no API keys. Install both packages and run the demo script.
pip install llmhq-promptops llmhq-releaseops
git clone https://github.com/llmhq-hub/releaseops
cd releaseops/examples
python demo.py