Every artifact in your stack has version control. Except the ones that define your agent.

Version, bundle, and ship the artifacts that define your agent. Git-native. Local-first.

Prompts. Policies. Model configs. These define your agent — and right now, they're unversioned.

Start versioning your prompts See the demos →

The key moment: one line changes everything

v1.0.0 — Conservative

"Escalate any refund over $50"

$120 refund → escalate_ticket

v1.1.0 — Permissive

"Auto-approve up to $200"

$120 refund → approve_refund

Attribution traced to: line 15 in system prompt — threshold changed from $50 to $200

See the demos →

How it works

PromptOps

Version your prompts

Write prompts as YAML templates with variables. PromptOps auto-versions them on every git commit — semantic tags, diff tracking, and version history out of the box.

Reference any version in code: :v1.2.0, :latest, or even :unstaged for testing uncommitted changes.

support-system.yaml

id: support-system
description: Customer support agent
variables:
  customer_name: { required: true }
  request: { required: true }
template: |
  You are a support agent for Acme Corp.

  REFUND POLICY:
  - Auto-approve refunds up to $200
  - Escalate refunds over $200
  - Never approve if customer is abusive

ReleaseOps

Bundle and promote

ReleaseOps reads your versioned prompts, bundles them with tool policies and model config into an immutable, SHA-256 content-addressed artifact.

Promote through environments with eval gates. Rollback in one command. Every action recorded in an audit trail.

app.py

from llmhq_releaseops.runtime import RuntimeLoader

loader = RuntimeLoader()
content = loader.load_bundle_content("support-agent@prod")

# Everything resolved and ready to use
model    = content["model"]       # {"model": "claude-sonnet-4-5", ...}
prompts  = content["prompts"]     # {"system": "You are a support agent..."}
policies = content["policies"]    # {"tools": {"allowed": [...]}, ...}

# Metadata auto-injected into OTel spans (silent no-op if OTel not configured)

Attribution

Know why behavior changed

When behavior shifts between versions, attribution traces each agent action back to the specific prompt lines and policy rules that influenced it.

Pattern matching with confidence scoring — not causal claims. Points engineers to the right place to investigate.

terminal

# Why did v1.0.0 ESCALATE the $120 refund?
Primary influence (confidence: 0.82, HIGH):
  Source: prompt (support-system@v1.0.0)
  Line 15: "Escalate any refund over $50"

# Why did v1.1.0 APPROVE it?
Primary influence (confidence: 0.82, HIGH):
  Source: prompt (support-system@v1.1.0)
  Line 13: "Auto-approve up to $200"

Analytics

Compare versions

Aggregate behavioral metrics per version — latency percentiles, token usage, tool call distributions, error rates. Compare any two versions with weighted significance levels.

Overall assessment: improvement, regression, neutral, or mixed. Integrates with OpenTelemetry and LangSmith.

terminal

# Compare behavioral metrics across versions
releaseops analytics compare support-agent@1.0.0 support-agent@1.1.0

  Metric              Baseline  Candidate    Change  Significance
  ------------------- --------- ---------- --------- -----------
  error_rate              0.00       0.00      0.0%  negligible
  avg_latency_ms        124.44     124.44      0.0%  negligible
  approve_refund          1/5        2/5     +100%   major
  escalate_ticket         3/5        2/5      -33%   major

  Overall: neutral (performance stable, behavior shifted)

Fits into what you already run

Your current stack

Agent Framework

LangChain, CrewAI, raw API calls

↓

LLM Provider

OpenAI, Anthropic, local models

↓

Observability

OpenTelemetry, LangSmith, Datadog

With LLMhq

Agent Framework

unchanged

↓

LLMhq

Versioning · Bundling · Promotion · Attribution

↓

LLM Provider

unchanged

↓

Observability

enriched with release metadata

LLMhq sits between your agent artifacts and your existing infrastructure. It doesn't replace anything — it adds version control, release engineering, and behavioral observability to whatever you're already running.

Get started in seconds

Start with PromptOps. Add ReleaseOps when you need bundles, promotion, and attribution.

# Start with prompt versioning
pip install llmhq-promptops
promptops init repo

# Add release engineering when you're ready
pip install llmhq-releaseops
releaseops init

# Or install everything
pip install llmhq-promptops llmhq-releaseops

What makes this different

Git-Native

Your prompts, bundles, and promotion history are YAML files in your repo. No new systems to learn. git log is the audit trail. git diff shows what changed.

Local-First

Your prompts stay in your repo. Your logic stays on your machine.

Nothing phones home. No API keys required to version prompts. No SaaS dashboard. Run everything locally with zero external dependencies.

No Lock-In

MIT licensed. Walk away anytime.

Framework-agnostic. Works with OpenAI, Anthropic, or local models. All artifacts are plain YAML in git. Stop using LLMhq tomorrow — everything is still in your repo.

Works With Your Existing Stack

Adds to your tools. Replaces none of them.

Already using OpenTelemetry? Release metadata auto-injects into your existing spans. Using LangSmith? Query your existing traces filtered by bundle version. Not using either? Everything still works — observability integrations are additive, never required.

See the full workflow

The interactive demos run both tools end-to-end with real scenarios. No API keys needed.

Browse the Demos

Community

GitHub PromptOps on PyPI ReleaseOps on PyPI Discussions