Changed a prompt. Agent behaved differently in production. Which line?

Version, bundle, and ship AI agent behavior like software.

Prompts, policies, and model config define your agent — but most teams deploy them without versioning, testing, or rollback. PromptOps versions every prompt automatically. ReleaseOps bundles them with policies, promotes through gated environments, and traces exactly why behavior changed.

One workflow, two tools

1
Write
YAML prompt template with variables
2
Version
Git commit auto-tags v1.2.0
3
Bundle
Prompt + policy + model config
4
Promote
dev → staging → prod with gates
5
Monitor
Attribution + behavioral analytics
PromptOps
ReleaseOps

How it works

1
PromptOps

Version your prompts

Write prompts as YAML templates with variables. PromptOps auto-versions them on every git commit — semantic tags, diff tracking, and version history out of the box.

Reference any version in code: :v1.2.0, :latest, or even :unstaged for testing uncommitted changes.

support-system.yaml
id: support-system
description: Customer support agent
variables:
  customer_name: { required: true }
  request: { required: true }
template: |
  You are a support agent for Acme Corp.

  REFUND POLICY:
  - Auto-approve refunds up to $200
  - Escalate refunds over $200
  - Never approve if customer is abusive
2
ReleaseOps

Bundle and promote

ReleaseOps reads your versioned prompt via PromptBridge, bundles it with tool policies and model config into an immutable, SHA-256 content-addressed artifact.

Promote through environments with eval gates. Rollback instantly. Every action recorded in an audit trail.

app.py
from llmhq_releaseops.runtime import RuntimeLoader

loader = RuntimeLoader()
bundle, metadata = loader.load_bundle("support-agent@prod")

# Everything resolved and verified
model    = bundle.model_config.model       # claude-sonnet-4-5
prompts  = bundle.prompts                  # versioned refs
policies = bundle.policies                 # tool access rules

# Metadata auto-injected into OTel spans
3
Attribution

Know why behavior changed

When behavior shifts between versions, attribution traces each agent action back to the specific prompt lines and policy rules that influenced it.

Pattern matching with confidence scoring — not causal claims. Points engineers to the right place to investigate.

terminal
# Why did v1.0.0 ESCALATE the $120 refund?
Primary influence (confidence: 0.70):
  Source: prompt (support-system@v1.0.0)
  Line 15: "Escalate any refund over $50"

# Why did v1.1.0 APPROVE it?
Primary influence (confidence: 0.70):
  Source: prompt (support-system@v1.1.0)
  Line 13: "Auto-approve up to $200"

The key moment: one line changes everything

v1.0.0 — Conservative
"Escalate any refund over $50"
$120 refund → escalate_ticket
v1.1.0 — Permissive
"Auto-approve up to $200"
$120 refund → approve_refund
Attribution traced to: line 15 in system prompt — threshold changed from $50 to $200

See the full workflow

The interactive demo runs both tools end-to-end with a real scenario: 5 customer requests, two prompt versions, one behavioral divergence. No API keys needed.

Get started in seconds

Install both, or start with either — they work standalone or together.

pip install llmhq-promptops llmhq-releaseops

Community