ReleaseOps

Why ReleaseOps?

AI agents ship behavior through prompts, policies, and model configurations — not deterministic code. When something breaks in production, there's no git blame for "why did the agent start approving refunds it shouldn't?"

ReleaseOps brings standard release engineering to these behavior artifacts, so you always know what's running, what changed, and why.

Features

Bundle Creation

Compose prompts, policies, and model configs into immutable, content-addressed artifacts verified with SHA-256.

Gated Promotion

Move bundles through dev → staging → prod with configurable quality gates: evaluation, approval, and soak time.

Instant Rollback

Revert to any previous bundle version instantly with a full audit trail of every promotion and rollback.

Automated Evaluation

Run test suites with pluggable judges — exact match, regex, LLM-as-judge, or composite judges.

OpenTelemetry Integration

Automatically inject bundle metadata into OTel spans for production observability and tracing.

Behavior Attribution

Trace agent behavior back to specific prompt lines and policy rules with confidence scoring.

Installation

terminal bash

pip install llmhq-releaseops

Extra	Install	Adds
`eval`	`pip install llmhq-releaseops[eval]`	LLM-as-judge (OpenAI, Anthropic)
`langsmith`	`pip install llmhq-releaseops[langsmith]`	LangSmith trace queries
`dev`	`pip install llmhq-releaseops[dev]`	pytest, black, mypy

Quick Start

terminal bash

# Initialize release infrastructure
releaseops init

# Create a bundle from prompts and model config
releaseops bundle create support-agent \
  --artifact system=onboarding:v1.2.0 \
  --model claude-sonnet-4-5 --provider anthropic

# Promote through environments
releaseops promote promote support-agent 1.0.0 dev
releaseops promote promote support-agent 1.0.0 staging
releaseops promote promote support-agent 1.0.0 prod

# Check environment status
releaseops env list

# Compare versions when something changes
releaseops analytics compare support-agent@1.0.0 support-agent@1.1.0

Python SDK

app.py python

from llmhq_releaseops.runtime import RuntimeLoader

loader = RuntimeLoader()
bundle, metadata = loader.load_bundle("support-agent@prod")

# Access bundle data
model       = bundle.model_config.model        # "claude-sonnet-4-5"
temperature = bundle.model_config.temperature  # 0.7
prompts     = bundle.prompts                   # Dict[str, ArtifactRef]
policies    = bundle.policies                  # Dict[str, ArtifactRef]

# Metadata auto-injected into OpenTelemetry spans

Load fully resolved content

Use load_bundle_content() to get resolved prompt text and parsed policy YAML instead of refs.

app.py python

from llmhq_releaseops.runtime import RuntimeLoader

loader = RuntimeLoader()
content = loader.load_bundle_content("support-agent@prod")

# Everything resolved and ready to use
model    = content["model"].model          # "claude-sonnet-4-5"
prompts  = content["prompts"]              # {"system": "You are a support agent..."}
policies = content["policies"]             # {"tools": {"allowed": [...]}, ...}
metadata = content["metadata"]             # TelemetryContext (auto-injected into OTel)

Async support

async_app.py python

from llmhq_releaseops.runtime import AsyncRuntimeLoader

async_loader = AsyncRuntimeLoader()
bundle, metadata = await async_loader.load_bundle("support-agent@prod")

Evaluation Engine

Pluggable judge system for testing agent behavior before promotion. Eval reports gate promotion — a bundle cannot reach prod without a passing report.

Judge types

ExactMatch Contains Regex LLM-as-judge Composite

terminal bash

# Create and run an eval suite
releaseops eval create support-eval --bundle support-agent@dev
releaseops eval run support-eval
releaseops eval report support-eval          # markdown or JSON output

# Promotion is BLOCKED if no passing eval report exists
releaseops promote promote support-agent 1.1.0 prod
# Error: no passing eval report for support-agent 1.1.0

# Override with --skip-gates (emergency only)
releaseops promote promote support-agent 1.1.0 prod --skip-gates

Python eval suite

eval_suite.py python

from llmhq_releaseops.models.eval_suite import (
    EvalSuite, EvalCase, Assertion, JudgeType
)

suite = EvalSuite(
    id="support-eval",
    cases=[
        EvalCase(
            id="small-refund",
            input={"amount": "$30", "reason": "item not received"},
            assertions=[
                Assertion(
                    judge=JudgeType.CONTAINS,
                    expected="approved"
                )
            ]
        ),
        EvalCase(
            id="medium-refund",
            input={"amount": "$120", "reason": "changed mind"},
            assertions=[
                Assertion(
                    judge=JudgeType.LLM,
                    expected="agent should not escalate"
                )
            ]
        ),
    ]
)

Behavior Attribution

When behavior changes between versions, attribution traces the agent action back to the exact artifact lines that influenced it — prompt lines, policy rules, or model config — with confidence scoring.

terminal bash

releaseops attribution explain support-agent 1.1.0 \
  --action "approved refund for $120"

attribution output

# Primary influence: system_prompt  (HIGH, confidence 0.87)
#   Line 15: "Auto-approve refund requests up to $200"
#
# Secondary: tools_policy  (LOW, confidence 0.22)
#   Section: refund_tool — no relevant constraint found
#
# Overall assessment: Expected

Verdicts: Expected — behavior matches artifact intent · Unexpected — behavior deviates · Contradicts artifacts — behavior opposes an explicit rule.

Python API

attribution.py python

from llmhq_releaseops.attribution.analyzer import AttributionAnalyzer

analyzer = AttributionAnalyzer(store, prompt_bridge)
attribution = analyzer.analyze(trace_data, "support-agent", "1.1.0")

print(attribution.primary_influence)   # Highest-confidence explanation
print(attribution.overall_assessment)  # "Expected" / "Unexpected" / "Contradicts artifacts"

# Batch analysis
releaseops attribution analyze-batch support-agent 1.1.0 --traces traces.json

LangSmith Integration

Query LangSmith traces filtered by ReleaseOps bundle metadata. Aggregate behavioral metrics directly from your LangSmith project.

terminal bash

pip install llmhq-releaseops[langsmith]

langsmith_analytics.py python

import os
from llmhq_releaseops.analytics.platforms.langsmith import LangSmithPlatform
from llmhq_releaseops.analytics import TraceQuerier, MetricsAggregator

platform = LangSmithPlatform(api_key=os.environ["LANGSMITH_API_KEY"])
querier = TraceQuerier(platform)

# Fetch traces tagged with ReleaseOps bundle metadata
traces = querier.query_by_bundle("support-agent", "1.1.0", "prod")
metrics = MetricsAggregator().aggregate(traces, "support-agent", "1.1.0", "prod")

# CLI equivalent
# LANGSMITH_API_KEY=... releaseops analytics metrics support-agent@prod
# LANGSMITH_API_KEY=... releaseops analytics compare support-agent@1.0.0 support-agent@1.1.0

Key concepts

Bundle

Immutable, content-addressed manifest of prompts + policies + model config (SHA-256 verified)

Environment

Named deployment target (dev/staging/prod) with a pinned bundle version

Promotion

Moving a bundle through environments with optional quality gates (eval, approval, soak)

Telemetry

Automatic injection of bundle metadata into OpenTelemetry spans

Attribution

Trace agent behavior back to specific prompt lines and policy rules

Analytics

Aggregate behavioral metrics and compare versions to quantify behavioral shifts

Requirements

Python 3.10+
Git (required for storage)
Dependencies: Typer, PyYAML, Jinja2, GitPython, OpenTelemetry, llmhq-promptops

Links

PyPI Package GitHub Report Issues Discussions

ReleaseOps

Why ReleaseOps?

Features

Bundle Creation

Gated Promotion

Instant Rollback

Automated Evaluation

OpenTelemetry Integration

Behavior Attribution

Installation

Quick Start

Python SDK

Load fully resolved content

Async support

Evaluation Engine

Judge types

Python eval suite

Behavior Attribution

Python API

LangSmith Integration

Key concepts

Requirements

Links

See it in action