Every artifact in your stack has version control. Except the ones that define your agent.
Version, bundle, and ship the artifacts that define your agent. Git-native. Local-first.
Prompts. Policies. Model configs. These define your agent — and right now, they're unversioned.
The key moment: one line changes everything
How it works
Version your prompts
Write prompts as YAML templates with variables. PromptOps auto-versions them on every git commit — semantic tags, diff tracking, and version history out of the box.
Reference any version in code: :v1.2.0,
:latest, or even :unstaged
for testing uncommitted changes.
id: support-system
description: Customer support agent
variables:
customer_name: { required: true }
request: { required: true }
template: |
You are a support agent for Acme Corp.
REFUND POLICY:
- Auto-approve refunds up to $200
- Escalate refunds over $200
- Never approve if customer is abusive
Bundle and promote
ReleaseOps reads your versioned prompts, bundles them with tool policies and model config into an immutable, SHA-256 content-addressed artifact.
Promote through environments with eval gates. Rollback in one command. Every action recorded in an audit trail.
from llmhq_releaseops.runtime import RuntimeLoader
loader = RuntimeLoader()
content = loader.load_bundle_content("support-agent@prod")
# Everything resolved and ready to use
model = content["model"] # {"model": "claude-sonnet-4-5", ...}
prompts = content["prompts"] # {"system": "You are a support agent..."}
policies = content["policies"] # {"tools": {"allowed": [...]}, ...}
# Metadata auto-injected into OTel spans (silent no-op if OTel not configured)
Know why behavior changed
When behavior shifts between versions, attribution traces each agent action back to the specific prompt lines and policy rules that influenced it.
Pattern matching with confidence scoring — not causal claims. Points engineers to the right place to investigate.
# Why did v1.0.0 ESCALATE the $120 refund?
Primary influence (confidence: 0.82, HIGH):
Source: prompt (support-system@v1.0.0)
Line 15: "Escalate any refund over $50"
# Why did v1.1.0 APPROVE it?
Primary influence (confidence: 0.82, HIGH):
Source: prompt (support-system@v1.1.0)
Line 13: "Auto-approve up to $200"
Compare versions
Aggregate behavioral metrics per version — latency percentiles, token usage, tool call distributions, error rates. Compare any two versions with weighted significance levels.
Overall assessment: improvement, regression, neutral, or mixed. Integrates with OpenTelemetry and LangSmith.
# Compare behavioral metrics across versions
releaseops analytics compare support-agent@1.0.0 support-agent@1.1.0
Metric Baseline Candidate Change Significance
------------------- --------- ---------- --------- -----------
error_rate 0.00 0.00 0.0% negligible
avg_latency_ms 124.44 124.44 0.0% negligible
approve_refund 1/5 2/5 +100% major
escalate_ticket 3/5 2/5 -33% major
Overall: neutral (performance stable, behavior shifted)
Fits into what you already run
LLMhq sits between your agent artifacts and your existing infrastructure. It doesn't replace anything — it adds version control, release engineering, and behavioral observability to whatever you're already running.
Get started in seconds
Start with PromptOps. Add ReleaseOps when you need bundles, promotion, and attribution.
# Start with prompt versioning
pip install llmhq-promptops
promptops init repo
# Add release engineering when you're ready
pip install llmhq-releaseops
releaseops init
# Or install everything
pip install llmhq-promptops llmhq-releaseops
What makes this different
Git-Native
Powered by the same ol' git.
Your prompts, bundles, and promotion history are YAML files in your repo.
No new systems to learn. git log is the audit trail.
git diff shows what changed.
Local-First
Your prompts stay in your repo. Your logic stays on your machine.
Nothing phones home. No API keys required to version prompts. No SaaS dashboard. Run everything locally with zero external dependencies.
No Lock-In
MIT licensed. Walk away anytime.
Framework-agnostic. Works with OpenAI, Anthropic, or local models. All artifacts are plain YAML in git. Stop using LLMhq tomorrow — everything is still in your repo.
Works With Your Existing Stack
Adds to your tools. Replaces none of them.
Already using OpenTelemetry? Release metadata auto-injects into your existing spans. Using LangSmith? Query your existing traces filtered by bundle version. Not using either? Everything still works — observability integrations are additive, never required.
See the full workflow
The interactive demos run both tools end-to-end with real scenarios. No API keys needed.